VMware vSphere Fast Track Day#4 – Lessons Learned


This is my fourth day attending VMware vSphere Fast Track training at Jean Cordier Academy in Leuven, Belgium.
You can read here day one, two and three.  It is time to sum up what I have learned today in no particular order.

  • Coffee sucks, but worst, 2 days in a row trainees of other courses thought it was smart to eat up all cookies for the afternoon break, especially not theirs 😦
  • My trainer, Bert De Bruijn,  still holds pole position!
  • Virtual CPU concept:
    • A VM can have up to 8 vCPUs, and you can allocate one by one and not anymore to the power of 2,
    • When a vCPU must be scheduled, the VMKernel maps a vCPU to a Hardware Execution Context (HEC),
    • HEC is a processor’s capacity to schedule one thread of execution.
    • [UPDATED]
    • A two-vCPU VM gets scheduled on 2 HEC at a time, or none, a four-vCPU gets scheduled on 4 HEC at a time or none. This can lead to SMP VMs slower than single vCPU system due to pCPU contentions, for that reason ESX3 has introduced relaxed co-scheduling techniques. In vSphere the co-scheduling is even more relax. Read more at VMware.com and vSphere Resource Management Guide.
    • The %CSTP column in the CPU statistics panel in ESXTOP shows the fraction of time the VCPUs of a VM spent in the “co-stopped” state, waiting to be “co-started”. Low number is OK.
    • [/UPDATED]
    • Enabling HyperThreading (HT) allows 2 HEC to be executed at the same time unless they need the same on-chip resource (i.e. floating point unit),
    • HyperThreading must b enabled in BIOS.
  • VMKernel dynamically schedules VMs and SC (if any) and avoid scheduling multiple vCPUs on HECs on the same core. VMKernel looks every 20ms for VMs to  migrate from one HEC to another.
  • SC always runs on first HEC.
  • VMKernel manages the physical memory too. There is a virtualization overhead i.e. a two-vCPU VM with 1GB of memory incurs 176MB of overhead.
  • Memory Overcommitment techniques allow VMs to use more memory than the physical has available. There 3 techniques:
    • Transparent Page Sharing (TPS) detects identical pages in VMs’ memory and maps them to the same underlying physical page,
    • Vmmemctl (aka Balloon Driver) de-allocates memory from selected VMs when RAM is scarce.
      • VMware Tools must be installed in guest VMs,
      • By default up to 65% of VM’s memory can be taken away during a ballooning process, An advanced VMKernel settings control this value: Mem.CtlMaxPercent,
      • You can imagine that the OS’s pagefile.sys or Linux’s swap can be under heavy load when the Balloon Driver kicks in.
    • VMKernel Swap:
      • created when the VM is ON, deleted when it is OFF,
      • Size equal to the difference between the memory guaranteed to it if any and the maximum it can use,
      • Placing swap on local SSDs can mitigate the performance hit but that will reduce your ability to vMotion the VMs. Better to have shared SSDs.
  • Guest OS monitoring tools:
    • Task Manager,
    • IOmeter,
    • VMware Tools includes a library of functions called the Perfmon DLL.
  • vCenter Server performance charts are available for hosts and VMs.
  • CPU constrained, the key indicator is CPU Ready Time.
  • Disk performance problems are commonly caused by saturating the underlying physical storage. Machine disk usage (%) and I/O data counters provide information about average disk usage on a VM.
  • Network constrained, the key indicator are the DroppedTX and DroppedRX  network counter.
  • AppSpeed is a virtual appliance for proactive application performance management.
  • To improve performance of a VM:
    • Use DRS to balance VMs load across multiple hosts,
    • Use storage multipathing to balance the IO load across multiple path,
    • Use NIC teaming to balance network load across multiple pNICs,
    • Modify resource pool’s CPU and memory limits and reservations,
    • Modify VM’s CPU, memory and reservations,
    • Use network traffic shaping to give aVM more bandwidth during peak hours.
  • Resxtop/esxtop are utility to examine real-time resource usage for ESX/ESXi hosts. Refer to Duncan’s blog for (R)ESXTOP values/thresholds.
  • vm-support can be used to collect performance snapshots for future analysis.
  • Alarms are notification that occurs in response to selected events or conditions that occur with an object in the inventory.
  • There are much more pre-defined alarms in vSphere.
  • One of my favorite, an alarm can trigger the state of hardware and fire different actions depending on the status. You could vMotion VMs and put the host in maintenance mode if for instance one of the dual-PSU would failed.
  • Resource Management  is the allocation of resources from hosts and clusters to VMs. Resources include CPU, memory, storage and network.
  • Shares oscillates between 1 to 1,000,000. That should satisfy the needs for very complex setups with many resource pools.
  • While it is a good practice to use reservation, on the opposite the use of limits should be done carefully. Read Frank Denneman’s great article about shares, limit and reservation.
  • Resource pool is a logical abstraction for hierarchically managing CPU and memory resources. It is best practice to set shares, limit and reservations through RP than directly on the VMs.
  • Depending of the parent RP, a VM with a CPU shares of 1,000 can weight more than another VM with the same number of CPU shares.
  • Use carefully expandable reservation, a single child RP can use all its parents available resources leaving nothing for other child RPs.
  • You can schedule a task to change the resource settings of a RP or a VM to accommodate peaks and business priorities for example.
  • Thin provisioning enables VM to utilize storage space as needed.
  • vSphere provides alarms and reports to monitor thin provisioned storages.
  • Thin provisioning at the array and VMDKs is available. Be sure you have proper monitor in place.
  • It is a good practice to format your VMFS volumes with the largest possible block size, that is 8MB, to avoid too many SCSI locking  when VMFS meta data gets updated. Read this great post from Chad Sakac.
  • Thin disk requires VMFS-3.
  • Thin disks not supported with FT.
  • A good tip, create a dumb file that is 10% of your VMFS volume. If you hit the roof and need some space immediately, for instance to vMotion VMs, you delete that dumb file and recover 10% of you VMFS disk space, great isn’t it 🙂
  • Fragmentation of the OS is still an issue whether it is a physical or virtual server. You need to defragment regularly.
  • With vCenter Server Linked Mode you can ‘only’ manage 100 hosts and 1000 VMs per vCenter Server instances. Regular vCenter Server can manage 300 hosts and 3000 VMs.
  • SMASH is a plug-in that monitors host health status. SMASH stands for System Management Architecture for Server Hardware.
  • vCenter Server services are also monitored by a vCenter plug-in and displayed in the vCenter client.
  • vMotion works as follow:
    • vMotion is initiated automatically (DRS) or manually (User),
    • The VMs memory state is copied across the vMotion network to the target host. VM is still up and running, changes happen in memory. A list of memory changes is kept on the source host in a memory bitmap.
    • The step above is repeated until most (internal threshold is hit) of the memory is copied and a few or no changes remains in the memory bitmap.
    • Then the VM is quiesced and no more activity will occur on the VM. The source host send the last memory changes to the target host.
    • Immediately that the VM is quiesced on the source, it is started on the target.  This takes a few milliseconds.
    • Additionally a RARP request notifies the subnet (switches) that the VM’s MAC address is now a new switch port.
    • Finally the VM name is delete from the source.
  • Common vMotion problem, port group name for vMotion network is not identical across the VMware cluster.
  • Download VMware CPU Host Info from run-virtual.com. It runs in a VM.
  • A DRS Cluster is manage by vCenter Server and has the following resource management capabilities:
    • Initial placement -DRS either places the VM or makes recommendations,
    • Load balancing,
    • Power management with VMware DPM enabled,
    • VM affinity rules.
  • Read vSphere Resource Management Guide.
  • Best practice to enable EVC from the beginning so when you can hosts with newer processors and still be able to vMotion VMs on to new hardware.
  • DRS Cluster affinity rules are valid for a maximum of 2 VMs.
  • VMware DPM. You could have an issue with WOL if for instance the switch port is set to gigabit/full duplex and the powered off host’s NIC drops its speed to 100Mb (power saving feature).
  • VMware HA provides automatic restart of VMs in case of physical host failures. Read Duncan Epping’s blog article on that matter.
  • HA detects a host failure by monitoring the heartbeats sent over the hosts service console. This occurs every second.
  • To avoid split brain condition, HA waits 12 seconds before deciding that a host is isolated.

This is it for today. Tomorrow is the last day. There will a test at the end of the day, nothing official thought. Come back for the fifth and last part.

Advertisements

About PiroNet

Didier Pironet is an independent blogger and freelancer with +15 years of IT industry experience. Didier is also a former VMware inc. employee where he specialised in Datacenter and Cloud Infrastructure products as well as Infrastructure, Operations and IT Business Management products. Didier is passionate about technologies and he is found to be a creative and a visionary thinker, expressing with passion and excitement, hopefully inspiring and enrolling people to innovation and change.
This entry was posted in Uncategorized. Bookmark the permalink.

3 Responses to VMware vSphere Fast Track Day#4 – Lessons Learned

  1. larstr says:

    “A two-vCPU VM gets scheduled on 2 HEC at a time, that means two physical CPU must be free at the same time to scheduled the HEC.”

    This is not entirely true anymore. With relaxed co-scheduling they don’t have to be free at the exact same time: http://communities.vmware.com/docs/DOC-4960

    Lars

  2. deinoscloud says:

    Hi Larstr, you’re damn right and I have updated the page for relaxed co-scheduling with ESX3 and 4.

  3. Pingback: VMware vSphere Fast Track Day#5 – Lessons Learned « DeinosCloud

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s