With the long-awaited release of Kaveri, the merger between AMD and ATI is complete.
When AMD announced it was acquiring ATI in 2006 it did so with a single vision in mind: to eventually unify the CPU and the GPU.
The synthesis of this great project has been the Accelerated Processing Unit — the APU. Combining the CPU and GPU on the same die core space, AMD argued, would bring untold advantages in performance with the two working together in harmony as a compute engine.
Project Fusion, as AMD called this endeavor, first brought us Llano. Announced at the Consumer Electronics Show in 2011, the chip combined K10 CPU cores and Radeon 6000 GPU cores on the same die. The followup to Llano, Trinity, was released in October 2012 and featured a mix of Piledriver CPU cores and Radeon 7000 GPU cores.
This line of thinking wasn’t lost on Intel. Following AMD’s cues, it began placing GPUs and CPUs on the same die. As AMD’s Roy Taylor boasted to VR-Zone in a July interview, “An APU, to us, is any processor which includes a serial processor — a CPU and GPU in one package. By that definition, not only are Trinity and Richland APUs but so are Ivy Bridge, Sandy Bridge and Haswell.”
With Kaveri, AMD is hoping to launch the next step in its APU project: hUMA and HSA.
Heterogeneous Uniform Memory Access — hUMA — promises that that CPU and GPU will have direct access to the same memory pool allowing both processors to cooperate at the hardware level. Theoretically this process, called heterogeneous queuing (hQ), will eliminate the bottleneck of the CPU having to go through the operating system layer and memory to allocate tasks to the GPU.
AMD’s other project, HSA, or Heterogeneous Systems Architecture, refers to a programming methods that are hardware agnostic thus target mixed-mode computing. Theoretically, with HSA, a programmer could write code that supports calls to any HSA compliant hardware. Time will be saved as repetition is reduced under the banner of “one code to rule them all”.
While both hUMA and HSA are promising technologies, the fact is upon launch Kaveri didn’t ship with either really enabled. Granted, AMD provided a number of optimized benchmarks and demos of HSA in action but because the technology is so young and not yet widely adopted an objective demo is impossible.
The same goes for Mantle. AMD’s development tool to give coders console-like “close to metal” access to the GCN GPU core, is promising but the flagship game for the title, Battlefield 4, won’t have a Mantle version out until later this month.
So without these key parts of the promised Kaveri advantage available it feels almost unfair to benchmark and review the chip now. The chip is not, after all, all it can be without a full implementation of this promised technology.
Nevertheless, AMD has released Kaveri to the market thus a full review of the chip is warranted.
AMD has released Kaveri in two variants for it’s high-end A10 series of chips: the A10-7850K and the A10-7700K. A third variant was released for AMD’s low-power line, the A8-7600. The A10-7850K has a maximum clock speed of 4.0Ghz with a base clock of 3.7 Ghz, while the A10-7700K has a maximum clock speed of 3.8 Ghz with a base clock of 3.4 Ghz. The low-power A8-7600 has a maximum clock speed of 3.8 Ghz with a base clock speed of 3.3 Ghz.
All of the chips have a GPU frequency of 720 Mhz. The A10-7850K has eight GPU cores while the A10-7700K and the A8-7600 have six cores each.
Below is a provided die shot, showing how the CPU and GPU cores populate the die:
Thanks to a 28nm new process node developed in conjunction with GlobalFoundries, AMD says it has managed to double the number of transistors on die while shrinking the die size. Previous APUs based on the Piledriver and Richland architecture used a 32nm process node packed 1.3 billion transistors over 246 square mm while Kaveri has 2.41 billion transistors over 245 square mm.
Because of this new process node developed with GlobalFoundries, AMD says that it’s able to up the level of on-chip inter-process communication (IPC). IPC is a measure of how fast data can be exchanged amongst threads on multiple processes. With the revised Steamroller core on GlobalFoundries new process node, AMD said Kaveri is able to offer a five-to-10 percent improvement in single-threaded IPC and a 15-to-20 percent improvement in multithreaded IPC. While clock speed has been slightly scaled down, this improved metric means that Kaveri is a more efficient chip compared to what Richland brought to market.
But just how does Kaveri fare in benchmarking?