Skip to main content

HA Failover Capacity



We get an enormous amount of questions about VMware’s HA (High Availability), especially when users see a message stating there are Insufficient resources to satisfy HA failover. We have already discussed the mechanism that HA uses to provide high availability here. Now we need to understand capacity calculations. In current versions of ESX (3.02) and earlier the following calculation applies for failover capacity.
HA Failover Capacity

Failover Capacity is determined using a slot size value that is calculated on the cluster. Slots are calculated by a combination of the total CPU and Memory that are in the physical hosts. The calculation for failover capacity works as follows:

Let’s say you have 4 ESX servers in your VMware HA cluster and Configured Failover capacity on the cluster is set to 1.

Physical memory in the hosts is as follows:

ESX1 = 16 GB
ESX2 = 24 GB
ESX3 = 32 GB
ESX4 = 32 GB

In the cluster you have 24 VM’s each configured and running. Of the 24 VM’s running, determine the VM which has the highest “configured memory”. For this example let’s say this is 2GB. All other VMs are configured with less or equal to 2GB.

With this information we can now do the calculation:
1. Pick the ESX host which has the least amount of RAM. In this case it is ESX1 and the minimum amount of RAM is = 16 GB

2. Divide the value found in step 1 with value for the maximum RAM in a VM. In my example this gives us 8 (16 divided by 2). This means we have 8 slots available per ESX host in the cluster.

3. Since we have 4 hosts and the configured failover capacity for the cluster is 1, we are left with 3 hosts in a failure situation. Hence the total number of VMs that can be powered on these 3 servers is 24 VMs. (i.e. 8 multiplied by 3 = 24)

4. If the total number of VMs in the cluster exceeds 24 then it will give us “Insufficient resources to satisfy HA failover” and the “current failover capacity will be shown as 0″. If the number is less than 24, we should not get this message.

Note: If you are still seeing the message and you have less VM’s running than in the calculation allows for, check both the CPU and Memory reservations on both VM’s andresource pools, as this can skew the calculation. You should avoid unnecessary memory or cpu reservations on VM’s as this can cause these types of errors to occur, because we have to ensure that the resource is available.

There are multiple ways to fix, or get around this calculation. The most common are as follows:
• Set the “Allow Virtual Machines to be powered on even if they violate availability constraints” in the configuration of the cluster. In this case it ignores the above calculation and will try to power on as many VM’s as possible in case of HA failover. If this is the option chosen you can also set restart priority in the ‘Virtual Machine Options’ section of the cluster configuration. This way any high priority VM’s are powered on first, and then the lower priority up to the point where we cannot power any further VM’s on
• If you have one VM which is configured with a very high amount of memory, you can either lower its configured memory, or take it out of the cluster and run it on any other standalone ESX host. This will increase the number of slots available with the current hardware
• Increase the amount of RAM on servers so that there are more slots available with the current RAM reservations.
• Remove any CPU reservations on any VM(s) that are greater than the max speed of the processors in the hosts. For example if the CPU Usage on the summary tab of your ESXServer shows as follows:

Then you will see the error message popup if you have a CPU reservation greater than 2793MHz on a VM.
Note: The above calculation method is very limited and is going to be revised in future releases of VirtualCenter to improve calculations for HA failover.

Popular posts from this blog

AD LDS – Syncronizing AD LDS with Active Directory

First, we will install the AD LDS Instance: 1. Create and AD LDS instance by clicking Start -> Administrative Tools -> Active Directory Lightweight Directory Services Setup Wizard. The Setup Wizard appears. 2. Click Next . The Setup Options dialog box appears. For the sake of this guide, a unique instance will be the primary focus. I will have a separate post regarding AD LDS replication at some point in the near future. 3. Select A unique instance . 4. Click Next and the Instance Name dialog box appears. The instance name will help you identify and differentiate it from other instances that you may have installed on the same end point. The instance name will be listed in the data directory for the instance as well as in the Add or Remove Programs snap-in. 5. Enter a unique instance name, for example IDG. 6. Click Next to display the Ports configuration dialog box. 7. Leave ports at their default values unless you have conflicts with the default values. 8. Click N...

HOW TO EDIT THE BCD REGISTRY FILE

The BCD registry file controls which operating system installation starts and how long the boot manager waits before starting Windows. Basically, it’s like the Boot.ini file in earlier versions of Windows. If you need to edit it, the easiest way is to use the Startup And Recovery tool from within Vista. Just follow these steps: 1. Click Start. Right-click Computer, and then click Properties. 2. Click Advanced System Settings. 3. On the Advanced tab, under Startup and Recovery, click Settings. 4. Click the Default Operating System list, and edit other startup settings. Then, click OK. Same as Windows XP, right? But you’re probably not here because you couldn’t find that dialog box. You’re probably here because Windows Vista won’t start. In that case, you shouldn’t even worry about editing the BCD. Just run Startup Repair, and let the tool do what it’s supposed to. If you’re an advanced user, like an IT guy, you might want to edit the BCD file yourself. You can do this...

DNS Scavenging.

                        DNS Scavenging is a great answer to a problem that has been nagging everyone since RFC 2136 came out way back in 1997.  Despite many clever methods of ensuring that clients and DHCP servers that perform dynamic updates clean up after themselves sometimes DNS can get messy.  Remember that old test server that you built two years ago that caught fire before it could be used?  Probably not.  DNS still remembers it though.  There are two big issues with DNS scavenging that seem to come up a lot: "I'm hitting this 'scavenge now' button like a snare drum and nothing is happening.  Why?" or "I woke up this morning, my DNS zones are nearly empty and Active Directory is sitting in a corner rocking back and forth crying.  What happened?" This post should help us figure out when the first issue will happen and completely av...