Two Main Scale-Up Server Architectures – Part 1

To address the increasingly demanding workloads, processor sockets are added in a seamless way within a single server. You’re scaling up. Sockets are connected together as well as the memory and IO boards and applications can benefit from more compute power.

Refer to my first article of a series –  Scale-Out And Scale-Up Architectures – The Business-Critical Application Point Of View

There are two broad scale-up server architecture:

  • the “glueless” architecture
  • the “glued” architecture

The “glueless” architecture

The “glueless” architecture was designed by Intel. It was implemented in the Intel Xeon series E7.

When building servers above 4-sockets, they are directly connected together through the Intel QPI links.

The Intel QPI links are used to access memory, IO’s and networks as well the processors.

A “glueless” socket uses one of these 4 Intel QPI links to connect the processor socket to IO and the remaining three Intel QPI links to interconnect the processor sockets.

4-socket glueless architecture

4-socket glueless architecture – Courtesy of Bull

In a 8-socket configuration, each processor socket connects directly to three other sockets while the connection to the other four processor sockets are indirect.

8-socket glueless architecture

8-socket glueless architecture – Courtesy of Bull

The advantages of  a “glueless” architecture:

  • no requirement for specific development nor expertise from the server manufacturer. Every server makers can build a 8-socket server.
  • thus the cost of a 4-socket and 8-socket is also less

The disadvantages of a “glueless” architecture:

  • the TCO goes up when scaling out
  • limited to 8-socket servers
  • difficult to maintain cache coherency when socket increases
  • performance increase not linear
  • price/performance ratio decreases
  • efficiency not optimal when running large VMs
  • up to 65% of Intel QPI links bandwidth consumed to address QPI source broadcast snoopy protocol

What’s the issue with the Intel QPI source broadcast snoopy protocol? To achieve cache coherency, a read request must be reflected to all processor caches as a snoop.  You can compare this as doing  a broadcast on an IP network. Each processor must check for the requested memory line and provide the data if it has the most up to date version. In case the latest version is available in another cache, source broadcast snoopy protocol provides the minimum latency when memory line is copied from one cache to the next. In a source broadcast snoopy protocol, all reads result in snoops to all other caches consuming link and cache bandwidth as these snoop packets use cache cycle and link resources otherwise used for data transfers.

The primary workloads concerned by the Intel QPI source broadcast snoopy issue are:

  • Java applications
  • large databases
  • latency sensitive applications

No bottleneck should result of a scale-up approach otherwise the architecture in useless. Thus linearity of increased performance should be in line with the added resources.

Next part, we will discuss the “glued” architecture and how it can address the drawbacks of the “glueless” architecture while maintaining in line performances.

Source: Bull, Intel, Wikipedia

About PiroNet

Didier Pironet is an independent blogger and freelancer with +15 years of IT industry experience. Didier is also a former VMware inc. employee where he specialised in Datacenter and Cloud Infrastructure products as well as Infrastructure, Operations and IT Business Management products. Didier is passionate about technologies and he is found to be a creative and a visionary thinker, expressing with passion and excitement, hopefully inspiring and enrolling people to innovation and change.
This entry was posted in Bull, ESXi, Monster VM, Performance, Uncategorized, vSphere. Bookmark the permalink.

3 Responses to Two Main Scale-Up Server Architectures – Part 1

  1. Pingback: Two Main Scale-Up Server Architectures – Part 2 « DeinosCloud

  2. Pingback: Bull’s Implementation of a Glued Architecture « DeinosCloud

  3. Pingback: Bull’s BCS Architecture – Deep Dive – Part 1 « DeinosCloud

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s