Even prior to its name unveiling last week – or official announcement later in the year – Intel's Xeon Phi shook the supercomputer world with supposedly snatching some of the upcoming large projects mostly away from Nvidia Tesla. What about its performance?
Intel's MIC, now officially christened Xeon Phi, was known to have been seeded for a while across few dozen test sites worldwide, including our region – mostly in supercomputers and workstations as a computational accelerator. While the performance of the seed units was either not that great or plain & simple confidential, its X86 + SIMD base was supposed to greatly help programmers comfortably make use of it without the CUDA or OpenCL complications.
And, it did work – according to our high level sources in both China and Singapore, in some cases the code porting time differential is huge, like a few months to handle CUDA becoming few days to complete the MIC code port.
So, right after the ISC supercomputer show in Germany, there were two interesting updates from our friends.
First, Intel Xeon Phi has managed to kick out the GPUs as the FP (floating point) accelerator in several very large upcoming deals worldwide, including some in the 100 PFLOPs range, for 2013 and 2014. We are talking here about the replacements for the current single-digit leaders of the TOP 500 list, that right now use Nvidia Tesla as the accelerator. A, say, 100 PFLOPs supercomputer composed of an equal proportion of, say, Ivy Bridge EP Xeons and Xeon Phi's – i.e. each dual CPU node having dual Phi – would have in excess of 80,000 Xeon CPUs and 80,000 MICs for the users to play with, as a single machine. Most importantly, like it or not, as a single homogeneous X86 instruction set machine.
But wait, you'd say that the upcoming year-end Tesla K20 based on the GK110 would have higher DP FP peak performance at 1.5 TFLOPs compared to just slightly over 1 TFLOPs for the first MIC? So, Nvidia should keep the performance lead?
Well, that's where the second point comes in from the sources. Intel is still tuning the clock speeds and final performance of the initial generation Xeon Phi, and, by the time the November SC show in the States comes, it might come much, much, closer to the claimed figures by Nvidia. Also, with Intel's process resources and binning capabilities, don't be surprised to see a few different MIC speed bins and even memory sizes, including a larger 16 GB RAM model which Nvidia can't match with the GK110 for now.
What should Nvidia do? Well, they have to figure out how to handle being squeezed between the Xeon Phi 'common X86 ease of use' and AMD Teraflop DP-FP consumer Graphics Core Next GPUs at consumer prices. Then, picking up ARM may not have proved to be such a smart idea at the end, if looking at high end computing. Simply, among the RISC architectures, ARM's multi-decade layers of baggage, and committee-style decision approach, don't help to scale over the X86, and there are much leaner RISC, like MIPS or Alpha, lurking around – the last two very much at the high end, coming from China…