Ballooning And Hypervisor Swapping – Common Misunderstandings

This post was triggered right after I had given out a training session to some chaps of the Operations Team at a customer’s facilities. When it’s the first time you encounter virtualization and deep dive into it, it’s just some time too complicated to get it right from the beginning. Surely in the case of memory resources management and especially in memory reclamation techniques where terms like ‘swap’ or ‘swapping’ are used for two different memory reclamation mechanisms. Remember we can have swapping happening at the guest but also at the host level and in certain conditions it can happen at the same time. Trainees came up with many questions regarding this part of the training and I felt the need to sit down a moment and re-think how I was giving my explanations to the trainees in this particular matter.

The Basics

The virtual memory space, that is the guest’s memory space, is divided into blocks, typically 4KB, called pages. The physical memory, that is the host’s memory, is also divided into blocks, also typically 4KB. When host physical memory is full, the data for virtual pages that are not present in host physical memory are stored on disk. I say typically 4kb because ESX/ESXi also provides support for large pages of 2 MB.

There are three guest memory layers in ESX/ESXi: guest virtual memory, guest physical memory, and host physical memory. Their relationships are illustrated in the picture below taken out of Understanding Memory Resource Management in VMware ESX 4.1

Virtual memory levels (a) and memory address translation (b) in ESX

Each guest consumes memory based on its configured size, plus additional overhead memory for virtualization. The configured size is the amount of memory that is presented to the guest, but it is independent of the amount of physical RAM that is allocated to the guest, which depends on the resource settings like shares, reservation, limit and but not always, the memory contention status on the host.

Also your guest can use more memory than the host has physically available. You can have a host with 4GB of physical memory and run three guests with 2GB memory each. In that case, the memory is overcommitted of about 2GB.

Memory Overcommitment

This overcommitment plays an important role in virtualization where you basically share resources across guests that can be idle for some of them and at capacity for others at a given point in time. This situation is highly dynamic and guests can be running at capacity for 1 hour and sit idle for the rest of the day.

Thus to improve memory utilization, an ESX/ESXi host transfers memory from idle guests to guests that need more memory and to achieve that, there are 4 techniques in ESX/ESXi 4.1, which are Memory Compression, Transparent Page Sharing (TPS), Ballooning and Hypervisor Swapping.

Now let’s focus on the misunderstandings with Ballooning and Hypervisor Swapping.

Ballooning

When the hypervisor runs multiple guests and the total amount of the free host memory becomes low, none of the guests will free guest physical memory because the guest OS cannot detect the host’s memory shortage. Guests run isolated from each other and don’t even know they are virtual machines.

My guess, one of the misunderstandings comes from this picture below. The swap space on the picture tends to refer to an external datastore.

Ballooning makes the guest OS aware of the low memory status of the host. The balloon driver, aka vmmemctl, communicates with the hypervisor through a private channel. If the hypervisor needs to reclaim guest memory, it sets a proper target balloon size for the balloon driver, making it “inflate” by allocating guest physical pages within the guest.

Typically, the hypervisor inflates the guest balloon when it is under memory pressure. By inflating the balloon, the hypervisor transfers the memory pressure from the host to the guest. In response, the balloon driver allocates and pins guest physical memory. The guest operating system determines if it needs to page out guest physical memory to satisfy the balloon driver’s allocation requests. If the guest has plenty of free guest physical memory, inflating the balloon will induce no paging and will not impact guest performance. However, if the guest is already under memory pressure, the guest OS decides which guest physical pages to be paged out to the virtual swap device, that is the OS’s own swap file or partition, in order to satisfy the balloon driver’s allocation requests. The virtual swap device is, in a Windows environment, typically located on the C: drive.

The genius of Ballooning is that it allows the guest OS to intelligently make the hard decision about which pages to be paged out without the hypervisor’s involvement.

You can limit the amount of memory Ballooning driver (vmmemctl) reclaims by setting the ched.mem.maxmemctl parameter for a specific guest. This option specifies the maximum amount of memory that can be reclaimed from a guest in megabytes (MB). Note that the Ballooning driver forces the OS to swap out physical memory pages to the guest’s own swap file/partition. Thus the swap file size is an important factor of how well the Ballooning driver will reclaim physical memory. Don’t forget that if Ballooning can’t reclaim enough memory, hypervisor swapping will kick in and this one has an even bigger impact on performances.

This guest-level swap space must be greater than or equal to the difference between the guest physical memory size and its Reservation. I would recommend to stick the value of ched.mem.maxmemctl to the size of your OS’ swap file size especially if this one is lower than the guest physical memory minus its Reservation. The maximum for ched.mem.maxmemctl should never exceed the guest physical memory minus its Reservation. Here are several examples to set ched.mem.maxmemctl:

Example#1
-Guest physical memory set to 2GB
-Guest page file set to 500MB
–ched.mem.maxmemctl set to 500MB

Example#2
-Guest physical memory set to 2GB
-Guest page file set to 3GB
–ched.mem.maxmemctl set to 2GB

Example#3
-Guest physical memory set to 2GB
-memory reservation set to 1GB
-Guest page file set to 3GB
–ched.mem.maxmemctl set to 1GB

Example#4
-Guest physical memory set to 2GB
-memory reservation set to 1GB
-Guest page file set to 500MB
–ched.mem.maxmemctl set to 500MB

N.B. VMware Tools must be installed into the guest OS in order to enable ballooning.

When Ballooning is not working?:
– If it was never installed,
– If it is explicitly disabled,
– If it is not running (for example, whilst the guest operating system is booting),
– If it is temporarily unable to reclaim memory quickly enough to satisfy current system demands,
– If it is functioning properly, but maximum balloon size is reached,

CAUTION If memory is overcommitted, and the guest OS is configured with insufficient swap space, the guest OS in the virtual machine can fail!

Hypervisor Swapping

In the cases where Ballooning (and TPS) are not sufficient to reclaim memory, ESX employs Hypervisor Swapping to reclaim memory. At guest startup, the hypervisor creates a separate swap file for the guest. This file located in the guest’s home directory has an extension .vswp. Then, if necessary, the hypervisor can directly swap out guest physical memory to that swap file, which frees host physical memory for other guests. The swap file size is set to the guest physical memory minus its Reservation. For example, if you allocated 4GB to a guest and set a Reservation of 1GB, the swap file size will be 3GB.

N.B. If the swap file cannot be created for any reason, the virtual machine won’t be able to power on.

Besides the limitation on the reclaimed memory size, both TPS and Ballooning take time to reclaim memory. Indeed TPS speed depends on the page scan rate and the sharing opportunity and Ballooning speed relies on the guest OS’s response time for memory allocation. On the other hand Hypervisor Swapping is a guaranteed technique to reclaim a specific amount of memory within a specific amount of time but at a cost! It is used as a last resort to reclaim memory from the guest and because there is no ‘genius’ behind this technique, we can observe the following performance issues:

Page selection problems: Under certain circumstances, hypervisor swapping may severely penalize guest performance. This occurs when the hypervisor has no knowledge about which guest physical pages should be swapped out, and the swapping may cause unintended interactions with the native memory management policies in the guest operating system.

N.B. ESX/ESXi mitigates the impact of interacting with guest OS memory management by randomly selecting the swapped guest physical pages.

Double paging problems: Another known issue is the double paging problem. Assuming the hypervisor swaps out a guest physical page, it is possible that the guest operating system pages out the same physical page, if the guest is also under memory pressure. This causes the page to be swapped in from the hypervisor swap device and immediately to be paged out to the virtual machine’s virtual swap device.
Page selection and double-paging problems exist because the information needed to avoid them is not available to the hypervisor.

N.B. There is no known way mitigating this issue!

High swap-in latency: Swapping in pages is expensive for a VM. If the hypervisor swaps out a guest page and the guest subsequently accesses that page, the VM will get blocked until the page is swapped in from disk. High swap-in latency, which can be tens of milliseconds, can severely degrade guest performance.

N.B. The only trick here to mitigate high swap-in latency, is by placing the hypervisor swap files, that is the .vswp files, to an high-speed, low-latency datastore, either locally to the hypervisor with hardware such OCZ RevoDrive or FusioIO IODrive PCIe cards or on SAN with an SSD Tier1 datastore. Both uses NAND technology for low-latency, high-speed and high IOPS.

N.B. As a side note, putting the hypervisor swap files on local storage can lead to a slight degradation in performance for VMware vMotion because pages swapped to a local swap file on the source host must be transferred across the network to the destination host.

Summary

Ballooning happens inside the guest by leveraging the OS swap file(s) and its native memory management policies with no guarantee it will free up enough physical memory.
Hypervisor Swapping happens as a last resort and it is a guaranteed technique to free up host physical memory by paging out randomly guest physical memory to a .vswp file located by default in the virtual machine home directory.

I could talk about the other two memory reclamation techniques, TPS and the new Memory Compression and also the role of the IMT (Idle Memory Tax), perhaps I will do a follow-up, but here in this blog post I wanted to focus on common misunderstandings with Ballooning and Hypervisor Swapping.

I hope you enjoyed reading this article as much as I enjoyed writing it!

Sources: vSphere Resource Management Guide, Understanding Memory Resource Management in VMware ESX 4.1, Yellow-Bricks.com, Boche.net

11 Responses to Ballooning And Hypervisor Swapping – Common Misunderstandings

vMackFS says:

September 30, 2010 at 21:10

“In the cases where Ballooning (and TPS) are not sufficient to reclaim memory, ESX employs Hypervisor Swapping to reclaim memory.”

Doesn’t memory compression kick in before hypervisor swapping occurs?

- deinoscloud says:
  
  October 1, 2010 at 00:41
  
  Hi vMachFS,
  
  Thx for your comment.
  
  So yes indeed by default memory compression (MC) kicks in just before hypervisor swapping. Note that MC kicks in only when hypervisor swapping is called.
  
  MC is actually more of an enhancement to the hypervisor swapping technique and not really a memory reclamation technique on it’s own.
  
  So far MC is tied to the hypervisor swapping technique. maybe in a future release MC will kick in based on guest virtual memory paging out to the guest OS’ swap file.
  
  Rgds,
  Didier
  
YP Chien says:

October 3, 2010 at 21:01

ESX only utilizes Memory Compression when the compression ratio is 2 or more, otherwise Kernel swap is still be employed. In our lab, we found that with MC, the potential of performance hit due to swapping are significantly reduced (by examining the CPU memory latency with or without swapping) .

- deinoscloud says:
  
  October 4, 2010 at 10:35
  
  Hi YP Chien,
  Thx for commenting!
  
  You’re right regarding the compression ratio.
  MC uses an empty 4KB page to store 2 other 4KB pages with data that are compressed at 50% (2:1) each at least.
  
  Did you test with different value for Mem.MemZipMaxPct ?
  
  Thx,
  Didier
  
  - YP Chien says:
    
    October 4, 2010 at 19:41
    
    Hi Didier:
    
    We have not tried to change the Mem.MemZipMaxPct setting. I am not sure if this would have any positive or negative effects. Instead, we plan to test if SSD would be helpful in cases that the ESX must resort to Kernel Swapping while ESX must reclaim more memory above beyond using Memory Balloon and Memory Compression. Stay tuned.
    
Rutvik Karve says:

March 24, 2011 at 16:43

thanx! the info. you posted is really helpful. 🙂

Muhammad Ahmed says:

December 1, 2011 at 16:40

Very useful information. Many thansk for the same.

Pingback: Nested KVM Hypervisor Support
Pingback: Nested KVM Hypervisor Support
sophiaajaz says:

October 11, 2015 at 16:50

Thank you for this information ! Really enjoyed reading it 🙂 (Y)

Pingback: An administrator notices that 8 out of 10 virtual machines have memory ballooning and swapping. However, virtual machine 9 is not ballooning or swapping and virtual machine 10 is not ballooning but is swapping. – VCP Study Guide

	Tom Lockwood on Real Life Scenario – Mig…
	How To Troubleshoot… on Chunk Size Of a RAID0 Volume O…
	PiroNet on It All Started With This …
	Gorka on It All Started With This …
	An administrator not… on Ballooning And Hypervisor Swap…