Compared to ESX 3.5, the ESX 4.0 scheduler includes several new features and enhancements that help improve the throughput of this benchmark. Relaxed co-scheduling of vCPUs is allowed in earlier releases of ESX but has been further fine-tuned in ESX 4.0. This is especially beneficial to efficient resource usage as the number of vCPUs in a virtual machine increases. In ESX 3.5, the scheduler would acquire a lock on a group of pCPUs within which vCPUs of a virtual machine were to be scheduled. In ESX 4.0 this has been replaced with
finer-grained locking. It reduces scheduling overheads in cases where frequent scheduling decisions are needed.
Another significant improvement has been in the area of cache-aware scheduling of worlds. Worlds, in VMware terminology, are schedulable entities analogous to processes in conventional operating systems. A world is said to have migrated when it is scheduled on a core other than the one on which it last executed. When this happens, the hardware cache miss rates increase. Cache miss rates can be minimized by always scheduling a world on the same core. However this can result in delays in scheduling worlds as well as high
CPU idle times. With intelligent world migrations, the scheduler in ESX 4.0 strikes a good balance between low CPU idle time and low cache miss rates. The scheduler also takes into account the processor cached architecture. This is especially important in light of the differences between the various processor architectures on the market. Migration algorithms have also been enhanced to take into account the load on the vCPUs and physical CPUs. These changes have helped to significantly improve throughput for this benchmark.