Home > Gaming > Graphics Cards > How AMD can “Steamroller” the competition in 2013
Graphics CardsReviews

How AMD can “Steamroller” the competition in 2013

You've seen the Net coverage of AMD Mark Papermaster's Hot Chips Steamroller micro-architecture unveiling. Looks like what Bulldozer should have been in the first place a year ago, but some stuff is still missing to compete better against Intel…

… And it's not in the core, actually. The core and related improvements, if they really bring the mentioned 45% overall performance gain including the clock frequency boost, are good. But, and a big but here, they are still in the same socket – with dual-channel memory for the single die chip, and quad-channel memory, both DDR3, for the dual die server variety.

At up to 5 dual-core pairs per die, only two channels of DDR3 memory might not feed the CPU as well as four channels of DDR3-1866 (possibly even 2133, looking at Inphi's register clock chip announcement this summer for such ECC buffered DIMM support) on the Ivy Bridge EP next year. The effects on memory bound apps including the increasingly popular 'big data' and analytics, would be serious.

Yet, AMD is short of resources – partly due to their own making – to create new sockets with greater memory and interconnect bandwidth, until Excavator 2014 generation. So, aside of asking another party to co-fund a new socket, what's the remaining possible option? There's one…
Remember some years ago, AMD was active – even had a bunch of patents – regarding dense but fast eDRAM (low latency embedded DRAM) technology that is just a  bit slower than SRAM but provides density almost at DRAM level. That was (almost) forgotten.
Then, let's take a page from the old Alpha and IBM Power CPUs – as well as GT3 workstation (Xeon E3 v3) Haswell flavours with dedicated outside cache die, L4 in the Haswell case, sitting on a separate wide backside bus within the chip packaging, like an MCM.
Combining AMD eDRAM and backside L4 cache die approach could give AMD a, say, 128 MB or even 256 MB dedicated L4 cache sitting on a wide, even 1024 bit, bus within the chip packaging, and massively help counter the bandwidth drawback of the two DDR channels per die. In some apps where the code and/or big loops of data fit within that footprint, you could get over double the real life performance just this way. What say you, AMD? The time to act is now, and the options aren't many to choose from…
Picture Credits: Futurama S06E25 'Overclockwise'

Nebojsa Novakovic
In the spare time over the past two decades, editor and writer of high-end computer hardware and design features and analysis for European and US media and analyst houses. Reviews of high end hardware are my specialty for 28 years already.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Read previous post:
Roundup – USB 3.0 Flash Drives from Corsair, Lexar and Patriot

USB flash drives are the most common and widely used form of editable data storage, which today have all but...