The freshly launched Xeon E5 does break all the performance records in 2 socket X86 space, as seen by our Lennard's server review as well. However, compute density is critical to win large scale HPC contracts too. What did Intel develop to fit all this stuff into the smallest space, without compromising the performance?
The Socket 2011 'Sandy Bridge EP' Intel Xeon E5 processors are fast, with 8 cores, 20 MB L3 cache, and plenty of memory and I/O resources to boot. They also inherit a large installed base of previous generation Xeons across the whole range of workstations and servers, from mini blades to supercomputers.
However, for the evaluation of high performance computing project platforms, especially medium to large supercomputers, the power and the density achievable play a big role too. We can't do much about the power here – the larger 416 mm2 dies required to fit all the features do take their toll, with the fastest server part, the E5-2690, spec'ed at 135W TDP at 2.9 GHz plus Turbo, of course.
The density can be worked on, though – just figure out how to squeeze four memory channels as well as all those QPI and PCIe v3 lanes coming out of each CPU to the smallest possible dual socket board layout, then engineer the cooling and power feed, not to mention the interconnect between the boards, to squeeze many dozens of those into each 42U or similarly tall compute rack.
Here's Intel's solution – look how compact it feels compared to a typical high-end PC mainboard:
A small, half width 6.4 inch by 17.7 inch board contains it all – two CPUs with full bandwidth memory, three full x16 PCIe v3 slots, I/O riser connections for storage, as well as a combination of dual GigE plus Connect-X Infiniband QDR (40 Gbps) or FDR (56 Gbps) high speed interconnect. Quite impressive, isn't it?
Now, two of these boards can easily fit into one 1U rackmount server chasis, and of course, depending how much space you need for other hardware, up to 40 of course could fit into a standard 42U rack. So, up to 80 high speed dual CPU servers, each with ~360 GFLOPs dual precision FP peak performance and up to 256 GB ECC RAM, in just one rack – or nearly 30 TFLOPs single rack cluster if you like it counted that way. Do keep in mind that, with at least 400W per node, such cluster would also need a good 30 kW of power at the peak usage… but at least you can pack them tight.