Looks like the same issue shows up with Intel® Gigabit Ethernet Controllers such the i350-T4. And the same fix is to be applied…
It’s been a long time I haven’t published anything on me blog site. As you may know I’ve been working for Bull for the last 21 months.
During that time I could familiarised with the awesomeness of Bull’s bullion server. A modular 4-socket x86 server that scales up up to 16-socket by stacking 4 modules.
For connectivity Bull certified state-of-the-art Intel® 82599 10 Gigabit Ethernet Controller. However recently I came across a strange issue with some of those bullion servers. All of the sudden they would lost network connectivity. Starting with one port, then eventually both ports of a dual-port Intel® 82599 10 Gigabit Ethernet Controller. The very specific symptom in this case is that the port would failed but would remain up in vSphere, while no traffic would go through. Also the Observed IP Ranges would show ‘none’ instead of the usual range of IP addresses. The basic troubleshooting steps and error codes would reveal a PCIe bandwidth issue to the CPU huh!?
Long story short, the issue comes from a tiny BIOS function dealing with the power management of PCIe adapters known as ASPM or Advanced State Power Management.
Basically Intel® 82599 10 Gigabit Ethernet Controller are just not compatible with this function and I would bet none of the other server grade 10 Gigabit Ethernet Controllers are neither compatible with this function. Who would want his PCIe card being put down to save on power on production server!
So to resolve this issue, disable ASPM on the BIOS/EFI configuration of the server. When ASPM is disabled, PCIe adapters without ASPM support operate normally.
Also best is to follow recommendations in the Performance Best Practices for VMware vSphere® 5.5 especially for the hardware BIOS settings.
Sources: Bull and VMware KB 2076374