Two Main Scale-Up Server Architectures – Part 1


To address the increasingly demanding workloads, processor sockets are added in a seamless way within a single server. You’re scaling up. Sockets are connected together as well as the memory and IO boards and applications can benefit from more compute power.

Refer to my first article of a series –  Scale-Out And Scale-Up Architectures – The Business-Critical Application Point Of View

There are two broad scale-up server architecture:

  • the “glueless” architecture
  • the “glued” architecture

The “glueless” architecture

The “glueless” architecture was designed by Intel. It was implemented in the Intel Xeon series E7.

When building servers above 4-sockets, they are directly connected together through the Intel QPI links.

The Intel QPI links are used to access memory, IO’s and networks as well the processors.

A “glueless” socket uses one of these 4 Intel QPI links to connect the processor socket to IO and the remaining three Intel QPI links to interconnect the processor sockets.

4-socket glueless architecture

4-socket glueless architecture – Courtesy of Bull

In a 8-socket configuration, each processor socket connects directly to three other sockets while the connection to the other four processor sockets are indirect.

8-socket glueless architecture

8-socket glueless architecture – Courtesy of Bull

The advantages of  a “glueless” architecture:

  • no requirement for specific development nor expertise from the server manufacturer. Every server makers can build a 8-socket server.
  • thus the cost of a 4-socket and 8-socket is also less

The disadvantages of a “glueless” architecture:

  • the TCO goes up when scaling out
  • limited to 8-socket servers
  • difficult to maintain cache coherency when socket increases
  • performance increase not linear
  • price/performance ratio decreases
  • efficiency not optimal when running large VMs
  • up to 65% of Intel QPI links bandwidth consumed to address QPI source broadcast snoopy protocol

What’s the issue with the Intel QPI source broadcast snoopy protocol? To achieve cache coherency, a read request must be reflected to all processor caches as a snoop.  You can compare this as doing  a broadcast on an IP network. Each processor must check for the requested memory line and provide the data if it has the most up to date version. In case the latest version is available in another cache, source broadcast snoopy protocol provides the minimum latency when memory line is copied from one cache to the next. In a source broadcast snoopy protocol, all reads result in snoops to all other caches consuming link and cache bandwidth as these snoop packets use cache cycle and link resources otherwise used for data transfers.

The primary workloads concerned by the Intel QPI source broadcast snoopy issue are:

  • Java applications
  • large databases
  • latency sensitive applications

No bottleneck should result of a scale-up approach otherwise the architecture in useless. Thus linearity of increased performance should be in line with the added resources.

Next part, we will discuss the “glued” architecture and how it can address the drawbacks of the “glueless” architecture while maintaining in line performances.

Source: Bull, Intel, Wikipedia

About these ads

3 thoughts on “Two Main Scale-Up Server Architectures – Part 1

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s