I was reading a recently published article at Microsoft Technet about how Microsoft.com moved to a virtualized infrastructure. The article describes the why and how Microsoft deployed a virtualized infrastructure using in-house products such Microsoft Hyper-V and Windows 2008 R2. Eat its own dog food is popular these days. VMware and Citrix do that as well. Nothing wrong here but not all virtualization infrastructures are equal and some best practices can hurt companies’ credit…
What surprised me actually is a feature called ‘Maintenance mode‘. Microsoft describes it as: ‘a server to not be available for virtual machine deployments’. I was thinking this is the same feature as VMware’s Maintenance Mode. This mode helps you to evacuate VMs and for instance patch your hosts, replace a faulty hardware, apply config changes that requre a reboot, etc… Most of the time, a host is put in Maintenance Mode for a very short period of time, once back up you leave Maintenance Mode and the host participates to the cluster again, right?
I kept on reading on found this other description: ‘Maintenance mode enables operators to perform live migrations in a prescribed fashion rather than allowing virtual machines to be sent to more than one host within the cluster…’. If I get it right, Microsoft has a dedicated server host where to perform migration of VMs. Perhaps that’s because Microsoft doesn’t have a DRS alike feature…
Again I kept on reading: ‘Also, if a node fails, this feature enables faster recovery time because the virtual machines from the failed node quickly migrate to the maintenance-mode node’. Now I think I got it! In another words, it allows a host server to be set aside in Maintenance mode ‘activities’ and therefore becomes that cluster’s target for a passive quick migration if another node fails.
Microsoft’s best practice defines one host server in ‘Maintenance mode’ every 15 active host servers!
Reading further, I came across this paragraph: ‘In addition to the total number of nodes, an organization should consider the type of cluster being deployed when it is determining the number of maintenance modes to have. For example, if the cluster hosts databases or other systems that maintain state, a minimum of two maintenance-mode nodes provides better protection against unexpected downtime.’ That makes two host server in ‘Maintenance mode’ every 14 active host servers now. Personally I call this a huge waste of resources 😮
Let’s do the math, so one to two ‘Maintenance mode’ host server(s) in every 15 or 14 active host servers. If I transplant this ‘best practice’ to one of my biggest customer, he would have between 16 to 32 Blade servers up but doing nothing, just idle and waiting! That’s up to two fully populated HP C7000 enclosures sitting there and waiting. Microsoft, are you kidding me?
Hopefully my customer runs vSphere/vCenter with DRS/HA/vMotion and many many other features only VMware can offer today to SMB’s up to Enterprise class customers. Better resources utilization, higher ratio, fewer physical servers, small footprint, high availability and greener. VMware, what else?