Teχlog: Cayman specifications leaked

It seems the Polish website FrazPC had a little mishap this morning. They mistakenly uploaded a lot of slides about Cayman, AMD's upcoming high-end GPU, apparently from a presentation recently given by AMD. I think the deadline was supposed to be today, so either FrazPC got the time wrong or the NDA changed. Either way, I—and others—had just enough time to save the relevant slides, so there you go:

So first of all, we can finally confirm that Cayman is based on a VLIW4 architecture. I've talked about it here, so I won't dwell on it much, let's just say that a VLIW4 unit should be almost as fast as a VLIW5 one, but smaller. We "know" from the recent Antilles leak that Cayman has 30 SIMDs, and here we can see that there is still one quad-TMU per SIMD, so that's 120 TMUs total, a 50% increase over Cypress (HD 5870)! Cayman looks like a real texturing beast.

Cayman also features what you might call a distributed geometry engine, similar to Fermi's, but more limited. Still, it can process two primitives per clock, and at 900MHz or more, that's over 1800 Mtri/s. That should be amply sufficient in even the most demanding games, but I can't help feeling that there could still be further improvements down the road. The bit about "off-chip buffer support for high-tessellation levels" sort of raises a red flag: it appears that Cayman can't handle high tessellation very well relying solely on on-die resources. Surely, being able to use an off-chip buffer is better than just choking on excessive information as Cypress seems to do in such cases, but it's not exactly ideal either. As usual, it's a trade-off, of course.

AMD promises tessellation performance at least 50% higher than that of Cypress, and 100+% higher with high tessellation factors. That's not nearly as good as Fermi, but I don't expect any game to reflect that.

As expected, the new VLIW4 units combine simple SPUs to handle transcendentals, occupying 3 slots, as I mentioned in my previous post. Again, thanks go to Gipsel for providing that information. AMD claims similar performance with a ~10% area reduction compared to Cypress, which is exactly what I had predicted. I wish I could say that it was anything more than a lucky guess, but I really can't.

Also note that when the slide says "2 64-bit MUL or ADD", that's a mistake, it really means "either two 64-bit ADDs or one 64-bit MUL". Still, with up to one 64-bit MAD or FMA per clock, Cayman achieves a DP rate of ¼, which isn't bad at all. Obviously, removing the non DP-capable T unit has helped. GPGPU folks should be happy about that, especially since there's more:

UPDATE: I hadn't even noticed, but the slide also says two 32-bit ADDs per cycle. That's obviously a mistake too, each VLIW unit is capable of four 32-bit ADDs per cycle. Once again, Gipsel was vigilant. This just goes to show that you shouldn't drink and make slides. ;-)

There are a few welcome improvements here. Exactly how the L2 cache will be used is unclear, however.

This table doesn't tell us much, but I took the liberty of adding some information, based on the recent leak about Antilles, and an educated guess for the memory bandwidth.

The next two slides introduce a new high-quality anti-aliasing mode…

…which Cayman should be able to handle just fine, thanks to seriously beefed-up ROPs:

But wait, there's more! AMD is introducing a new type of power management that they call "Power Containment":

Exactly how this works is far from clear, but AMD has apparently substantially increased granularity for power management, with regard to both time and functional blocks. This feature is claimed to be user-controllable through AMD's Overdrive utility, but I doubt it really affords all that much control. At least, you should be able to disable it, which could be useful for overclocking. Beyond that, I doubt there's much that Overdrive lets you modify.

Well, I wasn't there at the presentation, and I'm not sure what this slide is about exactly, but it seems to highlight the fact that power containment allows the GPU to always remain exactly within TDP, without having to scale clocks down any lower than necessary, and apparently helps with idle power too. I'd suggest waiting for comments from someone who was actually there, though.

All in all, when compared to Cypress, Cayman provides:

20% more SPUs, which are more efficient,
50% (slightly less capable) VLIW units,
50% more TMUs,
100% higher geometry throughput per clock,
significantly improved ROPs,
10% higher bandwidth, maybe more,
Higher clocks, most likely,
a few bits here and there…

Judging from all this, I'm going on record saying it should be faster than the GTX 580 when it is released some time next month. In the meantime, we'll just have to wait.

Teχlog

Monday, November 22, 2010

Cayman specifications leaked

No comments:

Post a Comment