0

Do You VMOSA?

For the past many years as a contractor and several missions where I could help Organisations assess, diagnose, audit and evaluate their virtual infrastructures and cloud initiatives I stumbled upon the fact that many of them just lack a strategic planning process. Eventually those who actually have gone through the process, or kind of, often ‘forgot’ to communicate the outcome to the rest of the Organisation or ‘forgot’ to regularly update it. Seriously!

Let’s (re-)define what actually is a Strategic Planning Process.

Simply put, it’s about groups of people, coming most of the time within the Organisation, deciding together what they want to accomplish, aka the Vision, and how they are going to get there, aka the Action Plan.

The Strategic Planning Process helps define VMOSA, which stands for Vision, Mission, Objectives, Strategies, and Action Plans.

VMOSA is a practical planning process that can be used by any Organisation. It is a comprehensive planning tool to help an Organisation by providing a blueprint for moving from dreams to actions.

Let’s briefly define and characterise each individual part.

The VISION (The Dream)

The Vision communicates what the organisation believes where they want to be in the near future, often within the 3 to 5 years. There are certain characteristics that a Vision statement should have:

  • Future focused
  • Challenging
  • Inspiring
  • Relevant
  • Concise

The MISSION (The What and the Why)

Developing Mission statements are the next step in the strategic planning process. An organisation’s Mission statement describes what the organisation is doing and why it’s doing that. There are certain characteristics that a Mission statement should have:

  • Purpose oriented
  • Broadly
  • Concise

The OBJECTIVES (How Much Of What Will Be Accomplished By When)

The next step is to develop the specific Objectives that are focused on achieving the Mission statement. An objective is derived from the goal, has the same intention as a goal, but it is more specific, quantifiable and verifiable than the goal. Remember that an Objective must start with the word “To”. An organisation’s Objectives generally lay out how much of what will be accomplished by when.

There are five basic characteristics to an objective that are called SMART:

  • Specific
  • Measurable
  • Achievable
  • Realistic
  • Timed

STRATEGIES (The How)

The next step in the process of VMOSA is developing your Strategies. Strategies explain how the organisation will reach its objectives.

There are basically three major steps in this process:

  • Collect information, internally and externally, to figure out where the organisation is now.
  • Synthesise it into a SWOT table.
  • Refresh and update your Vision to make still relevant and clear.

There are four key questions you need to answer within your strategy planning session:

  • Who’s the strategy manager?
  • How to communicate the strategy?
  • Who’s accountable for the strategy?
  • How often the strategy status is updated?

ACTION PLAN (What Change Will Happen, Who Will Do What By When)

Finally, an organisation’s action plan describes in great detail exactly how strategies will be implemented, within the wide strategy boundaries, to accomplish the objectives developed earlier in this process.

Action steps are developed for each component of the intervention or changes to be sought. These include:

  • Action step(s): What will happen
  • Person(s) responsible: Who will do what
  • Date to be completed: Timing of each action step
  • Resources required: Resources and support (both what is needed and what’s available )
  • Barriers or resistance, and a plan to overcome them!
  • Collaborators: Who else should know about this action

A good tip would be to regularly come together and talk about the plan and get updates. Eventually you will make adjustments and adaptations to the plan according to the feedback from the people who are responsible.  Your action plan will need to be tried and tested and revised, then tried and tested and revised again.

IN SUMMARY

VMOSA is a great tool to help organisations with their strategic planning process. Establishing a strategic planning is a never ending process. But at the end of the day it is a worthless process if organisations don’t communicate the outcome to their staff and ensure all stakeholders have understood it.

Sources: OnStrategy, CommunityToolBox

1

You Just Failed Your Private Cloud Project… Why?

CLoud-FailI have witnessed private cloud projects going belly up many times and by that I meant that these projects did not address the primary goals the sponsors set. Not that the goals were impossible or unachievable or maybe too exotic. Not at all! Actually the goals are the same for many organisations investing in private cloud projects. And by the way here are my top 3 goals organisations are looking at primarily when they decide to go private cloud:

  1. Cut down cost
  2. Better quality of service
  3. Increase Business agility

So how comes organisations can’t address these simple goals? Where do they fail? Here are my 5 failure patterns:

  1. Managing-by-magazine. No strategic plans.
  2. Understanding your own requirements. No business case, migrate vs. innovate.
  3. Lack of a holistic architectural discipline. Cloud will not save you from a good architecture design buddy.
  4. IT organisation is not ready. Complexity to hide stupidity.
  5. Insufficient Skill Set. Politics, technology groupies, design by best practice.

Now I tell you a secret. For a successful cloud project here is the recipe:

  • Adapt your business processes.
  • Simplify your IT processes.
  • Buy-in of key stakeholders.
  • Get the Experts in at the inception of the project.

And remember that Cloud Computing is much more than a technology from VMware, OpenStack or Microsoft. Cloud Computing is business opportunity to achieve cost reduction, better quality of service and increase Business agility.

0

The Importance of The Non-Functional Requirements

As an architect, one of my objectives is to collect, often to define, both the functional and non-functional requirements of a project. That seems so obvious, right!?

As a former VMware employee, I was educated and trained to follow Zachman Framework along with Thomas Andrews‘s Functional versus Non-Functional Requirements and Testing principles. Those who are VCDX’s or on their way to defend a VCDX design know exactly what I’m talking about.

Back to the title and the importance of the non-functional requirements. Let me illustrate this statement with the following short story and a picture. As you know a picture is worth a thousand words.

Once upon a time, there was this proud Hummer owner. After wearing out his tires he wanted a new set of wheels for his monster 4×4 truck. He got the nearest garage and requested four wheels, nothing more but nothing less. Illico presto the garage front desk clerk sold the guy four wheels and got them mounted on the SUV… And voila!

art_hummer_450

 

Good laugh isn’t it :)

Did the garage front desk clerk sell the appropriate wheels… Well probably YES he did!

Those things ARE wheels and the main function of a wheel IS to rotate. This is WHAT a wheel must be able to do or perform.

Beyond the fun and the buzz, we cannot imagine this truck winning the Dakar Rally with such wheels, can we!?

That’s where the importance of the non-functional requirements come in the picture.

A non-functional requirement states HOW a functional requirement should behave on top on what the function should do.

Additionally non-functional requirements help you to measure the quality of the function.

If only either the garage front desk clerk or the Hummer owner have defined the non-functional requirements we probably wouldn’t have these wooden wheels mounted on the SUV (and we wouldn’t have this funny picture either).

Many times I’ve seen and reviewed architecture design documentations where only  functional requirements were defined. Such designs have a common behaviour… They used to fail to address the overall objectives of the project.

 

4

Back To The Basics

During my time at Bull I had the opportunity to setup a Lunch&Learn program. For those who don’t know what it is, briefly it is a training event during lunch time. Usually the employer offers a free complimentary lunch.

The program had a huge positive impact and highly appreciated by the employees at Bull, with packed room for most of the topics … maybe because of the free lunch :) OK guys, just kidding here.

The format of a Lunch&Learn program do not allow you to deep dive a topic but really to set the basics, get a good grasp on the concepts and the terminology.

And that’s the trigger of this post.

You would be surprised how much, Cloud Computing for instance, is misunderstood by the people. Adding to that the misuse of the term in many marketing materials, white papers articles, blog posts, etc.

The big issue with this kind of situation is when we need to sit down and discuss a project, well each one has his own definition, terminology and conceptual view of the matter. Every one has his own opinion, more or less pertinent, but there is no consensus to move forward. How many of you had endless meetings just to agree about what is or is not a specific topic such Cloud Computing.

That’s why it is important to get back to the basics and eventually start any project by (re-)defining the baseline, the basics, so we speak all the same language and share the same concepts and terminology. No mis-interpretation, mis-understanding, no mis-communication, no mis-perception, no mis-xxx for the good of the project…

I would like to hear from you. Do you do such thing as set the basics prior starting a project? How do you deal with the mis-xxx? Do you define concepts and terminology within your architecture documentation for instance?

0

Physical Network Connectivity Lost And Intel® 82599 10 Gigabit Ethernet Controller

[UPDATE]
Looks like the same issue shows up with Intel® Gigabit Ethernet Controllers such the i350-T4. And the same fix is to be applied…
[/UPDATE]

It’s been a long time I haven’t published anything on me blog site. As you may know I’ve been working for Bull for the last 21 months.

During that time I could familiarised with the awesomeness of Bull’s bullion server. A modular 4-socket x86 server that scales up up to 16-socket by stacking 4 modules.

For connectivity Bull certified state-of-the-art Intel® 82599 10 Gigabit Ethernet Controller. However recently I came across a strange issue with some of those bullion servers. All of the sudden they would lost network connectivity. Starting with one port, then eventually both ports of a dual-port Intel® 82599 10 Gigabit Ethernet Controller. The very specific symptom in this case is that the port would failed but would remain up in vSphere, while no traffic would go through. Also the Observed IP Ranges would show ‘none’ instead of the usual range of IP addresses. The basic troubleshooting steps and error codes would reveal a PCIe bandwidth issue to the CPU huh!?

Long story short, the issue comes from a tiny BIOS function dealing with the power management of PCIe adapters known as ASPM or Advanced State Power Management.

Basically Intel® 82599 10 Gigabit Ethernet Controller are just not compatible with this function and I would bet none of the other server grade 10 Gigabit Ethernet Controllers are neither compatible with this function. Who would want his PCIe card being put down to save on power on production server!

So to resolve this issue, disable ASPM on the BIOS/EFI configuration of the server. When ASPM is disabled, PCIe adapters without ASPM support operate normally.

Also best is to follow recommendations in the Performance Best Practices for VMware vSphere® 5.5 especially for the hardware BIOS settings.

Sources: Bull and VMware KB 2076374

3

Could DINO Be The Future Of vSphere NUMA Scheduler?

Dee-No

DINO the future of vSphere NUMA scheduler uh!220px-Dino_Harikalar_Diyari_Flintstones_06029_nevit
First thing first, DINO is not Dino… Dino is one of the  The Flintstones’s fictional characters.
Flintstones. Meet the Flintstones. They’re the modern stone age family.
From the town of Bedrock, They’re a page right out of history…yabba dabba doo time!
All right, all right. DINO is not Dino. So what is DINO? I leave this for later.
For now let’s focus on NUMA design and vSphere NUMA Scheduler.

So what is NUMA?

Wikipedia says: “Non-Uniform Memory Access (NUMA) is a computer memory design used in multiprocessing, where the memory access time depends on the memory location relative to a processor. Under NUMA, a processor can access its own local memory faster than non-local memory, that is, memory local to another processor or memory shared between processors. NUMA architectures logically follow in scaling from symmetric multiprocessing (SMP) architectures”

NUMA is often contrasted with Uniform Memory Access (UMA) which is a shared memory architecture used in parallel computers. All the processors in the UMA model share the physical memory uniformly. In a UMA architecture, access time to a memory location is independent of which processor makes the request or which memory chip contains the transferred data. Read more at Wikipedia.

Figure 1 shows a classic SMP system where there is usually a single pool of memory also referred as an Uniform Memory Access (UMA). That is memory access is equal for all processors. Contention-aware algorithms works well here.

Figure 1 : SMP system - Uniform Memory Access (UMA)

Figure 1 : SMP system – Uniform Memory Access (UMA)

The main drawback of the UMA architecture is that it doesn’t follow in scaling from symmetric multiprocessing (SMP) architectures where many processors must compete for bandwidth on the same system bus. That’s why server vendors added NUMA design on top of SMP design. The first commercial implementation of a NUMA-based Unix system was the Symmetrical Multi Processing XPS-100 family of servers, designed by Dan Gielan of VAST Corporation for Honeywell Information Systems Italy (HISI). In 1991 Honeywell’s computer division was sold to Groupe Bull. How interesting is that!

Figure 2 shows a classic SMP system with Distributed Shared Memory (DSM). In a DSM system there are multiple pools of memory and the latency to access memory depends on the relative position of the processor and memory. This is also referred to a Non-Uniform Memory Access or NUMA.

Figure 2 : SMP system - Distributed Shared Memory (DSM) - Non-Uniform Memory Access (NUMA)

Figure 2 : SMP system – Distributed Shared Memory (DSM) – Non-Uniform Memory Access (NUMA)

Major benefit; each processor has local memory with the lowest latency. On the opposite remote memory access is slower.  Intel says that latency can go up to 70% and bandwidth as less than half of local access bandwidth.
But the biggest downside of DSM is that it only works well if the operating system is “NUMA-aware” and can efficiently place memory and processes. The OS scheduler and memory allocator play a critical role here.

vSphere is NUMA aware as long as the BIOS reports it. That is as long as the BIOS builds a System Resource Allocation Table (SRAT), so the ESX/ESXi host detects the system as NUMA and applies NUMA optimizations. If you enable node interleaving (also known as interleaved memory), the BIOS does not build an SRAT, so the ESX/ESXi host does not detect the system as NUMA. Does that mean that vSphere doesn’t do any optimization if you haven’t enabled NUMA in the BIOS? I guess it doesn’t since the scheduler doesn’t know the relationship between processor and local memory. That information is only given by the SRAT as I understand it.

What are vSphere NUMA optimizations I’m referring to?

Before we deep dive vSphere NUMA optimizations, first let’s define a Home Node. A Home Node is one of the system’s NUMA nodes containing processors and local memory, as indicated by the System Resource Allocation Table (SRAT).

They are two main vSphere NUMA optimization algorithms and settings you find in the vSphere NUMA Scheduler:

  1. Home Nodes and Initial Placement. When a virtual machine is powered on, ESX/ESXi assigns it a home node in a round robin fashion. To work around imbalanced systems when virtual machines are stopped or become idle, there is a second set of algorithms and settings called,
  2. Dynamic Load Balancing and Page Migration. ESX/ESXi combines the traditional initial placement approach with a dynamic rebalancing algorithm. Periodically (every two seconds by default), the system examines the loads of the various nodes and determines if it should rebalance the load by moving a virtual machine from one node to another. This calculation takes into account:
    1. the resource settings for virtual machines and
    2. resource pools to improve performance without violating fairness or resource entitlements.

To get a detailed description of the algorithms and settings used by ESX/ESXi to maximize application performance while still maintaining resource guarantees, visit  vmware.com.

vSphere  NUMA Scheduler has put in place pretty smart algorithms and settings when it comes to initial placement and memory management. I was wondering could it be better?
For instance, by managing contention for shared resources that occurs when memory-intensive threads are co-scheduled on cores that share parts of the memory hierarchy, such as last-level caches and memory controllers.

Meet DINO

Sergey Blagodurov from Simon, Sergey Zhuravlev, Mohammad Dashti and Alexandra Fedorova, all from Simon Fraser University, have published a very interesting technical paper at Usenix.org about limitation of current NUMA design and a proposition of a new approach they called DINO which stands for Distributed Intensity NUMA Online.

Those guys have discovered that state-of-the-art contention management algorithms fail to be effective on NUMA systems and may even hurt performance relative to a default OS scheduler.

Contention-aware algorithms focused primarily on UMA (Uniform Memory Access) systems, where there are multiple shared last-level caches (LLC), but only a single memory node equipped with the single memory controller, and memory can be accessed with the same latency from any core.

Remember that unlike on UMA systems, thread migrations are not cheap on NUMA systems because you also have to move the memory of the thread. So their approach to the problem is a mechanism that ensure that superfluous thread, those that are not likely to reduce contention, are not migrated in a NUMA system.

Existing contention aware algorithms perform NUMA-agnostic migration, and so a thread may end up running on a node remote from its memory. Actual vSphere NUMA scheduler is mitigating this issue by detecting when most of a VM’s memory is in a remote node and eventually load balancing and migrating memory as long as it doesn’t cause CPU contention to occur in that NUMA node.

Could DINO Be The Future Of vSphere NUMA Scheduler?

DINO organizes threads into broad classes according to their miss rates, and to perform migrations only when threads change their class, while trying to preserve thread-core affinities whenever possible. VMware vSphere NUMA optimizations would benefit from this by adding DINO approach to the existing optimization code by eventually migrate  memory based on threads and their miss rates as well.

In vSphere 5.x VMware introduced vNUMA. It presents the physical NUMA typology to the guest operating system. vNUMA is enabled by default on VMs greater than 8 way but you can change this by modifying the numa.vcpu.min setting. Is this an attempt to hand over the critical NUMA scheduler job to the guest OS hoping it does a better job? I would say that it may seems a good approach but at the cost of losing control. In a shared environment such a VMware environment, the virtual machine monitor should be in control, always.

Eggnog

I’m not within the secret of Gods. I don’t have access to VMware developers and codes. Thus what I’m being saying here is based on a series of elements, readings, articles, vendor architecture documents that I have compiled and read through while preparing Santa Christmas Eve with an enhanced version of eggnog in my mug. Therefore I may be wrong, off-target, totally inaccurate in my conclusion…

If you have another point of view, piece of information I don’t have. If I missed something in my thought process just post a comment. I’ll be very happy to read from you!

Source: vmware, wikipedia.org, usenix.org, clavis.sourceforge.net

0

Letter To Santa

Dear Santa, I’ve been terrific at virtualising low hanging fruit over the past years. I have reduced costs while increasing availability, reliability and performance for my applications. I’m a prodigy, I’m a super-hero!

Now my CIO asked me to realize the same wonder with our business mission-critical applications!

Those applications are massive man! They require a lot of resources and they need mainframe-style availability and performance.

Only monster vm’s can cope with the load and I need even bigger monster servers to hold them up!

Please Santa, I need you to get me some MONSTER BULLION’s

Monster Bullion

Monster Bullion

0

Bull’s BCS Architecture – Deep Dive – Part 4

Following on from part 1part 2 and part 3 here is … part 4 of this deep dive series on the Bull’s BCS Architecture.

In the previous post I focussed on Intel RAS features that Bull’s BCS Architecture is leveraging to make the memory more reliable and available.

In part 4, I will cover additional features leverage by Bull’s specific server architecture.  Some of these features address directly customers who require the level of reliability, availability and serviceability they could only find in expensive mainframe systems.

Reliability

Reliability addresses the ability of a system or a component to perform its required functions.

Dual Path IO HBAs

Each bullion module provides the ability to connect up to 3 HBA’s per IO Hub aka IOH which is an Intel component that provides the interface between the IO components such PCIe buses and the Intel QPI based processors. Those 3 HBA’s can then be mirrored inside the same bullion module to the HBA’s attached to a second IO Hub.  This teaming gives you a fault tolerant IO connectivity and associated with VMware’s Native Multipathing Plugin (MPP), you load balance the IO across the members of the teaming.

Four-Socket Two IOH Topology - Courtesy of Intel

Four-Socket Two Boxboro IOH Topology – Courtesy of Intel

Availability

Availability of a system is typically measured as a factor of its reliability – as reliability increases, so does availability.

Active/Passive Power-supplies

The bullion servers are equipped with two 1600W power supplies, which are 80+ Platinum level certified. They provide a full N+N redundancy for maximum availability.

For its mainframe systems, Bull  has developed a patented solution based on an active/passive power supply principles. This patented solution provides the highest efficiency rate possible, regardless the requirements and still provide a maximum uptime possible.

This technology from mainframe systems is now available on the bullion.

What is it exactly? The unique active/passive power supply solution provides an embedded fault resiliency against the most common electrical outages: micro-outages.

Rather than having to rely on heavy and expensive UPS systems bullion servers are equipped with an ultra-capacitor which provides the ability to switch from the active to the passive power supply in case of failure, as well as being protected against micro outages.

The ultra-capacitor provides a 300ms autonomy, sufficient to switch-over or to avoid application un-availability during micro-outages.

Bullion' s Ultra-Capacitor - Courtesy of Bull

Bullion’ s Ultra-Capacitor – Courtesy of Bull

The passive PSU rotates and it is frequently tested with failover and failback runs to guarantee its availability in case of a failure of the active PSU.

Bull announces a global consumption of 20-30% below competition.

Serviceability

It refers to the ability of technical support personnel to install, configure, and monitor computer products, identify exceptions or faults, debug or isolate faults to root cause analysis, and provide hardware or software maintenance in pursuit of solving a problem and restoring the product into service.

Empower Maintainability

To ease the replacement of the most frequently failing motorized components, such as the ventilators, power-supplies and disk-drives which are responsible for over 80% of hardware failures, with no impact whatsoever in the production on bullion servers since they is always a redundant part available to take over the failed one.

Replacing these components are now part of the Customer Replaceable Units (CRU’s). This program empowers you to repair your own machine. Other server vendors have the same policy actually. In situations where a computer failure can be attributed to an easily replaceable part ( a CRU), Bull sends you the new part. You simply swap swap the old part for the new one, no tools required. It is simple and a major advantage: really fast service for you and reduced support and maintenance fees.

Increase Availability

On the other side, there are components replaceable only by Support. They are part of the Field Replaceable Units (FRU).

To avoid downtime for the customer, and under the correct conditions, some FRUs can be excluded from the system at boot time: PSUs, processors, cores, QPI links, XQPI links, PCIe boards, embedded Ethernet controllers are among the elements which can be excluded at boot time and minimize downtime during serviceability.

RAS Monitoring

Each bullion module contains an embedded  Baseboard Management Controller (BMC) for monitoring and administration functions. This embedded controller runs the Server Hardware Console (SHC).

Bullion servers offer the following built-in functions:

  • SHC access to all of the module components by standard out-of-band (non-functional) paths – the I2C and SMBus interfaces.
  • A dedicated network to interconnect all of the SHCs of a server without affecting the customer’s network.
  • Dynamic communications between the SHC and the BIOS.

The SHC provides this information to vCenter or any other industry standard System Management solution, with support for IPMI, SNMP and other industry standard interfaces.

’nuff said with Bull’s BCS Architecture. It’s time to witness the power of the beast, it’s time to see the greenness of the monster. it’s time to meet the monster bullion ™ – Stay tuned!

Source: Bull, Intel, Wikipedia

3

Bull’s BCS Architecture – Deep Dive – Part 3

The last couple of posts about Bull’s BCS Architecture have been quite intense and I hope I’ve met the technical details you were expecting.

Here are the links to the entire deep dive series so far:

Now I want to talk about another feature that Bull’s BCS Architecture is leveraging: Intel RAS

What is RAS and what is its purpose?

Today’s crucial business challenges require the handling of unrecoverable hardware errors, while delivering uninterrupted application and transaction services to end users. Modern approaches strive to handle  unrecoverable errors throughout the complete application stack, from the underlying hardware to the application software itself.

RAS Flow - Courtesy of Intel

RAS Flow – Courtesy of Intel

Such solutions involve three components:

  1. reliability, how the solution preserves data integrity,
  2. availability, how it guarantees uninterrupted operation with minimal degradation,
  3. serviceability, how it simplifies proactively and reactively dealing with failed or potentially failed components.

This post  covers only the memory management mechanisms providing reliability and availability. Next post will cover other mechanisms.

Memory Management mechanisms

Memory errors are among the most  common hardware causes of machine crashes in production sites with large-scale systems.

Google® Inc. researchers conducted a  two-year study of memory errors in  Google’s server fleet (see Google Inc.,  “DRAM Errors in the Wild: A Large-Scale  Field Study”).

Researchers observed more than 8 percent of DIMMS and about one-third of the machines in the study were affected by correctable errors per year.

At the same time the annual percentage of detected uncorrected errors was 1.3 percent per machine and 0.22 percent per DIMM.

Capacity of memory module has increased – following Moore’s law – over the last two decades. In the 80’s you could buy 2MB memory modules, 20 years later, 32GB memory modules hit the market. That is a 16,000x improvement.

One of the unique reliability and availability features of the bullion is its RAM memory management and memory protection. From basic ECC  up to Memory Mirroring, memory protection mechanisms can guarantee up to 100% memory reliability on the bullion.

Let’s have a look at  some of those memory protection mechanisms available in the bullion:

ECC memory

Over and above traditional memory correction mechanisms, such as ECC memory, which maintains a memory system effectively free from single-bit errors.

Double device Data Correction (DDDC)

Bullion provides much more sophisticated mechanisms such as Double device Data Correction (DDDC), which corrects dual recoverable errors.

Double Device Data Correction - DDDC - Courtesy of Bull

Double Device Data Correction – DDDC – Courtesy of Bull

DIMM & Rank Sparing

The commonly available DIMM Sparing is now being enhanced to provide Rank Sparing. With Rank Sparing of dual rank DIMM’s, only 12.5% is being used to enhance the reliability of the memory system. If the level of ECC corrected errors becomes too high, it fails over the spares. Note that DIMM and Rank Sparing does not protect against uncorrectable memory errors.

DIMM Sparing- Rank Sparing - Courtesy of Bull

DIMM Sparing- Rank Sparing – Courtesy of Bull

MCA Recovery

In a virtualized environment, the Virtual Machine Manager (VMM) shares the silicon platform’s resources with each virtual machine (VM) running an OS and applications.

In systems without MCA recovery, an uncorrectable data error would cause the entire system and all of its virtual machines to crash, disrupting multiple applications.

With MCA recovery, when an uncorrectable data error is detected, the system can isolate the error to only the affected VM. Here the hardware notifies the VMM (Support for VMware vSphere 5.x), which then attempts to retire the failing memory page(s) and notify affected VMs and components.

If the failed page is in free memory then the page is retired and marked for replacement, and operation can return to normal. Otherwise, for each affected VM, if the VM can recover from the error it will continue operation; otherwise the VMM restarts the VM.

In all cases, once VM processing is done, the page is retired and marked, and operation returns to normal.

It is possible for the VM to notify its guest OS and have the OS take appropriate recovery actions, and even notify applications higher up in the software stack so that they take application-level recovery actions.

Here is a video demoing the MCA Recovery (MCAR) with VMware vSphere 5.0

Here is a diagram of MCA recovery process:

Software-Assisted MCA Recovery Process - Courtesy of Intel

Software-Assisted MCA Recovery Process – Courtesy of Intel

MCA Recovery is cool but the main drawback it does not offer 100% memory reliability. The scrubbing process that goes through all memory pages to detect the unrecoverable error takes some time, and a few CPU cycles too.

If you’re fortunate enough the MCA Recovery detects the error and reports to the VMM (VMware vSphere 5.x) otherwise you end up most probably with a purple screen of death.

Mirroring Mode

For 100% memory reliability, bullion use memory lockstep. Data are written simultaneously in two different memory modules in lockstep mode. It is the best memory protection mechanism for both reliability and availability as it protects against both correctable and uncorrectable memory errors. On four memory channel systems such the bullion, you cut your available number of DIMM slots by 1/2.

The bullion can hold up to 4TB of memory, which is surprisingly the double of the memory maximum of VMware vSphere 5.1 tolerates so far ;)

Memory Mirroring

Memory Mirroring – Courtesy of Bull

Mirroring mode offers 100% memory reliability and availability but it cost an arm, well two arms and maybe a leg as well… Memory performance drops as well by as much as 50%.

I’ve gone through a small subset of the many many features available to RAS. Here below a full list of Intel Xeon processor E7 family advanced RAS features.

Intel Xeon processor E7 family advanced RAS features - Courtesy of Intel

Intel Xeon processor E7 family advanced RAS features – Courtesy of Intel

I’ve setup a little poll about the memory protection mechanism you rely on in your production environments. Thank you for your time to answer!

Next post I will address some other RAS features available into the bullion. Stay tuned!

Source: Bull, Intel, Wikipedia

2

Bull’s BCS Architecture – Deep Dive – Part 2

In Bull’s BCS Architecture – Deep Dive – Part 1 I have listed BCS’s two key functionalities: CPU caching and the resilient eXternal Node-Controller fabric.

Now let’s deep dive in  to these two key functionalities. Bear with it is quite technical.

Enhanced system performance with CPU Caching

CPU caching provides significant benefits for system performance:

  • Minimizes inter-processor coherency communication and reduces latency to local memory. Processors in each 4-socket module have access to the smart CPU cache state stored in the eXternal Node Controller, thus eliminating the overhead requesting and receiving updates from all other processors.
  • Dynamic routing of traffic.
    When an inter-node-controller link is overused, Bull’s dynamic routing design  avoids performance bottleneck by routing traffic through the least-used path. The system uses all available lanes and maintains full bandwidth.

BCS Chip Design – Courtesy of Bull

With the Bull BCS architecture, through CPU caching and coherency snoop responses consume only 5 to 10% of the Intel QPI bandwidth and that of the switch fabric. Bull implementation provides local memory access latency comparable to regular 4-socket systems and 44% lower latency compared to 8-socket ‘gluesless’ systems.

Via the eXtended QPI (XQPI) network a Bull 4-socket  module communicates with the other 3x modules as it was a single 16-socket system. Therefore all accesses to local memory have the bandwidth and latency of a regular 4-socket system. Actually each BCS has an embedded directory of 144 SRAM’s of 20 Mb each for a total memory of 72 MB.

Adding to that, the BCS provides 2x MORE eXtended QPI links to interconnect additional 4-socket modules where a 8-socket ‘glueless’ system only offers 4 Intel QPI links. those links are utilized more efficiently as well. By recording when a cache in a remote 4-socket module has a copy of a memory line, the BCS eXternal Node-Controller can respond on behalf of all remote caches to each source snoop. This removes snoop traffic from consuming bandwidth AND reduces memory latency.

Enhanced reliability with Resilient System Fabric

Bull BCS Architecture extends the advanced reliability of the Intel Xeon processors E7-4800 series with a resilient eXtended-QPI (X-QPI) fabric. The BCS X-QPI fabric enables:

  • No more hops to reach the information inside any of the other processor caches.
  • Redundant data paths. Should a failure of a X-QPI link occur, automatically a redundant X-QPI link takes over.
  • Rapid recovery with an improved error logging and diagnostics information.
Bullion Multi Modules BCS Design

Bullion Multi-Modules BCS Design – Courtesy of Bull

What about RAS features?

Bull designed the BCS with RAS features (Reliability, Availability and Serviceability) consistent with Intel’s QPI RAS features.

The point-to-point links – that you find in QPI, Scalable Memory Interconnect (SMI)  and BCS fabric – that connect the chips in the bullion system have many RAS features in common including:

  • Cyclic Redundancy Checksum (CRC)
  • Link Level Retry (LLR)
  • Link Width Reduction (LWR)
  • Link Retrain

All the link resiliency features above apply to both Intel QPI/SMI and the X-QPI fabric (BCS). They are transparent to the hypervisor. The system remains operational.

XQPI cabling for a 16 sockets bullion

XQPI Cabling for a 16 Sockets bullion – Courtesy of Bull

In part 3 I will write about how Bull improves memory reliability by forwarding memory error detections right into the VMware hypervisor to avoid purple screen of death. This is not science fiction! It is available in a shop near you :)

Source: Bull, Intel