I wanted to follow up on my previous post; Is a Virtual Machine Bringing Your Storage Down? with some test in my home lab. Nothing ‘real life’ but enough to get familiar with these new features.
Control the IOPS is something crucial in a shared storage resources environment. VMware vSphere 4.1 has several techniques to do that and the premium feature is known as SIOC or Storage I/O Control. If you have the proper license just go for it and turn it on! Don’t forget to follow the requirements and recommendations.
Now if you haven’t bought vSphere 4.1 Enterprise Plus licenses, you still have other built-in storage features that will be used by your vSphere 4.1 hosts to manage, with great fairness, available storage resources. To name a few: Disk.schednumreqoutstanding, QFullSampleSize and QFullThreshold.
This week in my cave man, I’ve played a bit with a couple of those vSphere 4.1 new features, name it sched.scsix:x.throughputCap and sched.scsix:x.bandwidthCap. I have recorded and published a small video you can watch below.
In summary here are my observations:
- Apparently you cannot use both parameters at the same time. It is either throughputCap or bandwidthCap per disk. If that makes sense to any one please comment 🙂
- If you do use both parameters for the same disk, they are just ignored by the host.
- sched.scsi0:0.bandwidthCap must be set in Bps. The VMware KB isn’t that clear about this. Why can’t I set IOps here?
- If for instance your VM has two disks and you set sched.scsi0:0.bandwidthCap=100IOps for the first disk and sched.scsi0:1.bandwidthCap=50IOps for the second disk, the values are added up and actual limit for any of the disks is 150IOps. Is this a bug or by design?
How this two virtual machine parameters play with Disk.schednumreqoutstanding, QFullSampleSize, QFullThreshold and SIOC? I don’t catch the whole picture yet on this subject to write something relevant here, thus if any one knows more please enlighten us 😉
I would hate to inherit any virtual infrastructure that had previous owners twisting these kinds of nerd knobs 🙂
True…But hey that would not be funny if that was too easy 🙂
Hi Diedier,
Isn’t it the job of a VMware consultant to keep things simple instead of making it over complicated by setting advanced parameters that are not needed / recommended?
Hi Marnix and thx for your comment,
Keep things simple is a leitmotiv for anybody that is designing any type of environment. That’s my opinion.
Would I propose this to a customer? Don’t think so. I would surely propose him to turn on SIOC.
The article is based on a VMware KB which propose alternate options.
I don’t like it either, but I agree with Steve. Read this why I feel you shouldn’t be touching Disk.SchedNumReqOutstanding: http://www.yellow-bricks.com/2011/06/23/disk-schednumreqoutstanding-the-story/
By the way, why don’t you just refer to the “limit” as part of the virtual machine properties?
I have gone through your post… Excellent read as usual! It shows the gap between information available to the public and to an internal audience only 😉
You’re right, the sched.scsix:x.throughputCap can be set through the VM properties page. This is not true for sched.scsix:x.bandwidthCap though … or I’m not looking at the right place.
Thx for your comments anyway…
Pingback: Limiting disk I/O from a specific virtual machine | UP2V
Didier, thanks for your comment in my blog post at http://v-front.blogspot.com/2011/08/how-to-throttle-that-disk-io-hog.html explaining the same topic. It lead me to do more testing and digging and I finally find out how the scheduler really handles the limits. It does it on per datastore(!) basis. I updated my post with some more details…
Detailed description: http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&docType=kc&externalId=1038241&sliceId=1&docTypeID=DT_KB_1_1&dialogID=756586213&stateId=1%200%20756588783
Limiting disk I/O from a specific virtual machine (1038241)
Here is explained why total througput limited by 150 IOPs
Thx for the link to this VMW KB.
There is a direct impact on the KVAG/QAVG. i’m checking a few months how to do io a limit without making negative problems.
When setting IO limit, VMkernel has to do some IO throttling an thus insert ‘delays’ which shows up in KVAG/QAVG I guess…
The kernel is holding all te scsi commands and then passing that to the device ( lun ) with amount of io that is allowed.