Nvidia is preparing to unveil its highest end Kepler at the SC'12, but a lot of details were hidden from the public… until now. Thanks to CADnetworks' premature specifications, we learned everything that was missing.
German publication Heise.de was the first to catch up with the specifications of CADnetworks' new visual workstation, ProViz W60 and the G13, G26 and G40 servers. The specifications only confirm what we were suspecting – Nvidia was coy about disclosing the amount of memory that was to be placed on the K20 boards because it will actually end up less than its predecessor, Tesla C2075. We suspected this might happen as both Samsung and Hynix refused to manufacture the 4Gbit GDDR5 chip, citing lack of interest. According to our sources, the reason was a bit different, and quite unflattering for both Samsung and Hynix – both companies requested money from Nvidia in order to get the high-density GDDR5 to market, but they refused to give out exclusivity deal to the company. At least, that's how one side of the story goes.
Faced with the availability of only 1 and 2 Gbit GDDR5 chips, it doesn't take a rocket scientist to figure out that the K20 could only ship with no more than six physical gigabytes, e.g. five workable gigabytes once ECC is enabled (by default).
Thus, the K20 board is featuring a 7.1 billion transistor manufactured in 28nm process at TSMC, once more becoming the world's largest silicon (once again, a title Nvidia held since 2006 and the G80 chip). The chip in question has 13 SMX clusters enabled, which carry 192 cores each – for a grand total of 2,496 CUDA cores. The number trounces the GK104 (available in GeForce GTX 660Ti/670/680/690, Quadro K5000 and Tesla K10) by a significant margin, as the GK104 carries "just" 1536 cores. The chip physically carries 16 SMX cores, with two being disabled for yield purposes. If TSMC improves on its 28nm process, we might see a K25 or K30 with full 2880 CUDA cores, but don't expect too much.
The number of cores is not the only change, though – Nvidia was working hard on optimizing the Double Precision performance, failing short of self-imposed target of 1.5 TFLOPS DP. As it stands right now, the K20 will barely outperform the Xeon Phi (ex-Larrabee, now Knights Corner / Knights Ferry) and AMD's own FirePro S9000 (which is hidden on AMD's website so well that you can't find it unless you use a search engine).
The GPU will only work at 705MHz, meaning that you can expect double-precision performance to the tune of 1.17 TFLOPS, barely double than C2075/M2075/M2090. Single-precision performance is quite solid at 3.52 TFLOPS, but still lags behind AMD's Tahiti GL GPU (4TFLOPS Single, 1TFLOPS Double).
The price is set at already announced $3199, or 2950 Euro.
All in all, when it comes to 2013 Accelerator / GPGPU battle, we can now say that Tesla K20 is nowhere near as impressive as the company hopes it to be – Larrabee Reborn (Xeon Phi) and FirePro S9000 are too close for comfort, and with Intel's pricing schemes, no wonder Nvidia is worrying about its neighbor on the east side of the 101 freeway.