Teχlog

Of the Energy Consumption of Graphics Cards

2012-07-09T23:39:00.001+02:00

A long time ago, when narcissists could only stare at their mirrors for hours without being able to post thousands of self-portraits on Facebook, when every thought that crossed your mind was likely to die there without being tweeted the world over, the power consumption of graphics card was not deemed important. End-users cared about the operating temperatures of their devices, and their noise levels, but little more. Some of them did, however, engage in overclocking, and thus applications such as Furmark and OCCT were born. These made it easy to test the stability of overclocked cards, by pushing them to their functional and thermal limits.

But gradually, consumer computing became more mobile, just as high-end graphics cards became ever more power-hungry, reaching and, sometimes, even exceeding 300W. Naturally, end-users started caring about power, and reviewers began searching for ways to better inform their readers. They turned to commonly used stress tests (e.g. Furmark and OCCT) and measured the power consumption of graphics cards while they were running them. For a while, this proved useful: it gave consumers an upper bound for power draw (give or take a few watts, to account for the natural variability from sample to sample).

But hardware vendors were well aware of the increasing importance of power, and therefore started adding increasingly sophisticated control mechanisms meant to limit power to a certain level. When these were first introduced, reviewers noted that they did in fact cap power to the specified level, without apparently giving it much thought. By now, however, most of them have realized that power control mechanisms such as AMD's PowerTune effectively make stress tests irrelevant, since they do not stress GPUs anymore. At best, they still provide readers with an upper bound for power, but it happens to be, give or take a few watts, the card's thermal design power, which does not make this information very helpful.

In reaction, most reviewers decided to test power consumption in real video games instead, thus giving a more realistic idea of what cards may draw in real-world scenarios. But as Damien Triolet showed, the power draw of different cards relative to one another may be significantly different from one game to the other. More specifically, AMD cards seem to consume more power in Anno 2070 than in Battlefield 3, relative to their NVIDIA counterparts. A careful observer will further note that AMD cards perform better in Anno. One can therefore suppose that they reach higher occupancy of their computing units in this game, which leads to higher performance, but also higher power consumption. Finally, and even though their power consumption increases in Anno, their power-efficiency (relative to GeForces) increases as well. This makes sense, as it is generally better to have busy transistors than idle ones, sitting around and doing nothing more than leaking current. So higher performance in a given game tends to lead to higher efficiency as well (relative to the competition). In other words, performing well remains a good thing. That is reassuring, but there remains a problem: how can we determine the real power-efficiency of graphics cards over a broad range of games?

Perhaps there are trends and certain engines, certain types of games tend to favor one architecture more than the other, and perhaps there exist good representatives for each "class" of games, which may be used for power testing. But to my knowledge, no one has yet identified them, if they do indeed exist. And that is not the only problem. Indeed, while most reviewers do not specify the exact nature of the power figures they present, I believe they generally give the maximum instantaneous power draw recorded. This may be considered somewhat useful, as it gives some idea of the kind of power supply unit necessary to feed a given card, but it does not guarantee that no game will ever require more power. More importantly, it does not tell us which card consumed more energy over the length of the benchmark. Indeed, a card X may have drawn an average of 150W with a peak of 170W while a card Y drew an average of 130W, with a peak of 185W. Card Y may require a slightly more powerful PSU, but it is nevertheless more energy-efficient than card X.

The only possible conclusion is that reviewers ought to measure the total energy of each card they are testing, for each game they are testing, or their power consumption figures only give a very approximate, and possibly misleading picture of reality. This does, of course, increase their workload significantly, but the previous observations lead me to believe that it is necessary.

PS: Scott Wasson from The Tech Report has been using a rather innovative methodology for performance testing since last September. It is detailed here, it is very good, and I think every reviewer should adopt it. I do not know how he might feel about this, but he should welcome it—after all, imitation is the sincerest form of flattery, and good ideas are meant to be spread.

I should also note that while this entry only mentions graphics card, that is only because they can draw up to 300W, and sometimes even more with some of the crazier, low-volume models. Most of what I said holds true for CPUs as well, or really just about any component.

Furthermore, I think that AMD and NVIDIA's renaming practices are dishonest and harmful to consumers, and that they need to stop.

Tales of Woe of a Windows user

2012-03-30T21:09:00.001+02:00

This woman is trying to empty a Windows folder. God help her.

I've not updated this blog in a long time, but today I'd like to share a painful experience. First, I'm an Opera user, and while I'm overall quite happy with the browser, it won't empty its disk cache on exit, even though it is supposed to. As a result, the cache tends to exceed 10GB after a while. When I first noticed this, I figured "OK, that's annoying but no big deal, I'll just create a shortcut to the cache directory on my desktop and manually empty it from time to time."

So I created the shortcut, but things didn't exactly go as planned. Opera breaks down its cache into many small .tmp files, so the cache directory often contains many thousands of tiny files. Now, as any Windows user is aware, simply deleting a file will actually send it to the recycle bin, but pressing shift at the same time will delete it for good. For some obscure reason that still eludes me, it is impossible to fully delete more than about 1200 files at once without going through the recycle bin, or at least my system refuses to do so and sends everything to the bin. I'm not sure whether Windows modifies the files in any way when sending them to the bin, but because they are so numerous in the disk cache, the deletion process takes forever. As if that weren't enough, it also slows the entire system down, and I still have to empty the bin afterwards to actually delete the files.

Needless to say, I quickly decided that this just wasn't an acceptable way of doing this, and I had to find something better. On most distributions of Linux, I just would have right-clicked in the folder, clicked Open terminal here or some similar option, typed rm * and pressed Enter, for a grand total of 2 clicks and 5 keys pressed. So I figured that doing something similar on Windows ought to be possible. It turns out that directly opening a PowerShell in a specific directory is impossible, so I had to open it at the root of my system, and then go to the desired directory. That sounds trivial, but PowerShell's auto-completion system is so terrible that it's actually quite a pain in the ass. Anyway, I finally reached the cache directory, typed Remove-Item *, pressed Enter, and finally my cache emptied itself correctly, without going through the bin, and taking a reasonable amount of time.

But this still wasn't good enough. Opening a shell and going to the right directory was far too tedious to do manually every time, so I decided to just write a script to do it. So I did, and I put it on my desktop. Then, I double-clicked it… which launched Notepad, with the script in it. OK, fine, Linux would have asked me whether I wanted to run it or display it, but that's not too bad. I closed it, right-clicked it, and clicked Run with Powershell. Then, a nice little command-line window opened, displayed some red text for a fraction of a second, and closed before I could read a single word. WTF? I tried it again, and managed to take a screenshot just in time. As it turned out (after a good bit of googling around) it's not possible to run your own scripts on Windows with the default settings. You have to launch PowerShell, and type in Set-ExecutionPolicy RemoteSigned, otherwise the OS refuses to run your scripts, because it doesn't trust them. So I did just that, and now I can run my little script from my desktop to empty my disk cache, with minimal human input and just a few seconds of work for the system.

Sure, originally, this is a bug on Opera's side, so in a way, this is Opera Software's fault more than Microsoft's. But the point of this blog entry is that because Windows is so weird, and in some ways so far behind Linux (on which it would have been the most trivial thing in the world) it took me 20 minutes to figure out how to empty a fucking folder.

Bulldozer is out

2011-10-12T20:35:00.001+02:00

Here are a couple of reviews: The Tech Report and Hardware.Fr [French].

And here's a summary:

OK, technically, that's a bobcat. But it's still funny.

Radeon HD 6970 & 6950 launched… and disappointing.

2010-12-15T18:30:00.003+01:00

The Radeon HD 6970 and its slower buddy, the HD 6950, were launched this morning… and kind of blow. OK, they're not bad cards, they're actually better than the HD 5870 and 5850, which they replace and which were already quite good, but the improvement falls seriously short of expectations: basically, the HD 6970 is about equal to the GeForce GTX 570, while the HD 6950 is just a bit faster than the HD 5870.

At least they're pretty…

They also draw a bit more power than the HD 5800s, so there isn't much of an improvement in the performance/W department. I'll probably have further comments on those cards later, but for now, here are a few reviews worth reading:

The Tech Report
AnandTech
Hardware.fr [French]
PCGamesHardware [German]
iXBT [Russian]

And here is a more comprehensive list.

Furthermore, I think that AMD and NVIDIA's renaming practices are dishonest and harmful to consumers, and that they need to stop.

Graphics update: Radeon HD 6970 & 6950

2010-12-13T22:00:00.001+01:00

I haven't been posting much lately, mostly because I've been busy, but also because there's a lot of nonsense floating around about the upcoming Radeons, which makes it quite difficult to sort out the truth from the FUD. That said, the following specs, gathered by Expreview, are almost certain.


Radeon HD 6970: specifications

Radeon HD 6950: specifications

These specifications may not appear all that impressive, but AMD has apparently improved the architecture's efficiency by quite a bit. Indeed, the HD 6970 seems to perform a solid 10% (if not more) above the GTX 570, which would make it almost as fast as the 580, according to performance numbers floating around.

The HD 6950 should therefore be roughly 5% below the GTX 570, perhaps equal or even slightly faster. In any case, it should be quite a close call. Exact power figures and prices have been rather elusive so far, but the cards will be released on December 15, so the wait is almost over.

It's going to be a merry Christmas indeed… :-)

Furthermore, I think that AMD and NVIDIA's renaming practices are dishonest and harmful to consumers, and that they need to stop.

RealWorld Tech: Introduction to OpenCL

2010-12-10T14:50:00.000+01:00

Real World Technologies has recently published an introduction to OpenCL, written by David Kanter. It's a pretty good place to start if you're interested in that kind of thing.

Have a nice read.

AMD Cayman (Radeon HD 6970) on December 15 after all

2010-12-07T16:37:00.004+01:00

UPDATE: wrong specs

Well, the title pretty much says it all. AMD's upcoming high-end graphics cards, the HD 6970 and HD 6950, based on Cayman, will be released on December 15, and that's pretty much official.

Stay tuned.

Furthermore, I think that AMD and NVIDIA's renaming practices are dishonest and harmful to consumers, and that they need to stop.

GeForce GTX 570 reviews out

2010-12-07T16:33:00.001+01:00

The GeForce GTX 570 has been released, and reviews are available from the usual suspects: The Tech Report, Anandtech, Hardware.fr…

In a previous post, I had predicted performance slightly below that of the GTX 480, but apparently I was a bit pessimistic, since it turned out to be a hair higher. As expected, some games do see a slight performance drop, especially in very high definitions, but on average, performance is just a bit higher. The GTX 570 draws about 250W under full load, which is closer to the 470 than to the 480, so that's good news. Noise levels are very reasonable too. All in all, at $350, this card looks like a pretty good addition to NVIDIA's lineup.

That said, it's a bit worrying that the 570's maximum power draw (under Furmark) is higher than the 470's, just as the 580's is higher than the 480's. I don't know about you, but I don't really like the direction this is going.

Now, let's just see what AMD can bring to the table.

Furthermore, I think that AMD and NVIDIA's renaming practices are dishonest and harmful to consumers, and that they need to stop.

AMD Radeon HD 6970 & 6950 on December 8?

2010-12-04T11:54:00.002+01:00


Specifications for the HD 6970, plus my own speculation.

We already knew that the GeForce GTX 570 was supposed to be released on December 7, but now it appears that AMD's upcoming high-end graphics cards, the HD 6970 and 6950, should be launched the following day, at least according to ZDNet. Those two are powered by the GPU known as Cayman, which I've already talked about here.

I don't recall hearing that date before, so perhaps ZDNet knows something most other rumor sites don't. Other people have mentioned December 13, or the following week, so who knows?

On a related note, Charlie Demerjian from SemiAccurate says AMD has just reduced allocation for the HD 6950, mostly in favor of the 6970. That's a good sign for yields, though there could be marketing/business reasons for this move too.

In any case, the wait should be over soon.

Furthermore, I think that AMD and NVIDIA's renaming practices are dishonest and harmful to consumers, and that they need to stop.

SemiAccurate: GF110 is 550mm²

2010-12-01T23:23:00.001+01:00

According to Charlie Demerjian's recent post on SemiAccurate, and contrary to what most people thought, NVIDIA's latest GPU, GF110, is actually bigger than GF100, with a very respectable size of about 550mm².

GF110 is the chip that powers the GTX 580 and 570, for those of you who might not remember. However, bear in mind that if GF110 enjoys better yields than GF100 (which is quite likely) it could be cheaper to produce than GF100 ever was, in spite of its larger size.

In any case, unless AMD's Cayman is substantially larger than we've been led to believe, NVIDIA will have a hard time fighting it with such a large chip and the high manufacturing costs that go with it.

Furthermore, I think that AMD and NVIDIA's renaming practices are dishonest and harmful to consumers, and that they need to stop.

The Tech Report: high-end cards and multi-GPU

2010-12-01T19:21:00.001+01:00

The Tech Report has just published an interesting graphics cards review, in which they pitted very high-end graphics cards against pairs of cheaper models, trying to achieve the highest possible performance/price ratio for very high performance levels.

The result is rather interesting, with the best solution being arguably a Crossfire of HD 6850s. The thirteenth page has a nice scatter graph, plotting performance against price.

Have a nice read.

Furthermore, I think that AMD and NVIDIA's renaming practices are dishonest and harmful to consumers, and that they need to stop.

GeForce GTX 570: specs and release date

2010-11-30T22:11:00.004+01:00

There had been whispers going around for a few weeks about the GeForce GTX 570, but now, thanks to the guys from Sweclockers, we have the specifications, and a release date: December 7.

This new card comes with 480 shaders, like the GTX 480, 1280MB or RAM, like the GTX 470, and 732/1464/3800MHz, for the base, shader, and RAM clocks, respectively… And those clocks are higher than the 480's. Confused, yet? So am I, so let's crunch a few numbers, shall we?

Now that's better. The rightmost column indicates how much better than the GTX 480 the 570 is, as a (sometimes negative) percentage. If it's green, then it's better, if it's red, it's worse—so far so good, right? If it's yellow, it's either neutral or not directly important. For instance, memory bus width in itself doesn't matter, but it contributes to memory bandwidth, which does. Therefore, memory bus width is yellow, but memory bandwidth could be either green or red (it happens to be red in this case).

Also note that this table chart doesn't show an important detail: in some cases, namely when processing RGB9E5 or FP16 textures, the GTX 580 and 570's TMUs are twice as fast as their predecessors'. The effect of this obviously depends on whether those formats are used in a particular game, and to what extent. In practice, you could see a performance gain anywhere between 0 and 15%, maybe more in very few pathological cases.

So, compared to the GTX 480, the 570 has slightly higher shader and texturing throughput, especially considering the improved TMUs, and that should help a bit. It also has slightly higher triangle throughput, but the GTX 480 was far from bottlenecked in this area, so it shouldn't have any measurable effect. Likewise, the GTX 570 has significantly less memory, but I don't expect that to be a problem in 1920×1200; in 2560×1600 with anti-aliasing, however, it could be.

The main problems are memory bandwidth and, to a somewhat lesser extent, fillrate. Those two go down by 13~14%, and I suspect it will have a significant effect.

All in all, it's hard to say how the GTX 570 will compare to the 480. I think we'll see it being slightly faster in some games, slower in others. Perhaps something like 3~5% slower on average, but I don't expect the gap to be larger than this.

Finally, Sweclockers' information mentions a TDP of 225W, which is 25W less than the GTX 480's official TDP, and 75W less than its actual maximum power draw. Then again, the GTX 580 has a TDP of 244W, but with its limiter off, it has been measured well upwards of 300W, so who knows?

In any case, the GTX 570 looks like a good replacement for the 480: though it might be a bit slower in some games, on average it should perform similarly, but with lower power consumption, and hopefully much lower noise levels. The big question is how expensive it will be, and of course, how it will compare to AMD's Cayman, which is due just a few days after the 570's launch.

PS: I'm happy to announce that Teχlog has just reached 1000 pageviews: a modest milestone, but the first ones always are… :-)

Furthermore, I think that AMD and NVIDIA's renaming practices are dishonest and harmful to consumers, and that they need to stop.

AMD pulls an NVIDIA

2010-11-29T12:47:00.002+01:00

A few years ago, renaming products was commonplace in the graphics world. Then, AMD sort of stopped doing it, and NVIDIA started doing it a lot more. The latter therefore gained a reputation for being something of a serial-renamer.

But last year, AMD surprised everyone by introducing new HD 5000 products that were in fact renamed HD 4000s, such as the Mobility Radeon HD 5145, 5165, or the oddly-named HD 530v/540v/550v/560v. AMD argued that OEMs demanded new names for existing 55nm DX10.1 designs. People complained for a day or two, but then forgot about it. After all, those were only low-end mobile products, and the commercial designations indicated fairly clearly that they were inferior to proper DX11 designs such as the HD 5600, for instance. Nevertheless, that was regrettable.

Then, a bit over a month ago, AMD introduced the Radeon HD 6800s, which were slower than the HD 5800s. While that wasn't strictly speaking a renaming, it was still misleading and an unpleasant surprise.

And today, AMD has just released "new" products, namely the HD 6500M and HD 6300M. Now you might think that those are mobile derivatives of AMD's latest Northern Islands architecture, but they're not. The specifications for these two additions to AMD's lineup state that they feature the "UVD 2 dedicated video playback accelerator" which is a component of Evergreen, otherwise known as the HD 5000 series. Those parts are in fact renamed Evergreen products. More specifically, the HD 6500M bears striking resemblance to the Mobility HD 5770, and the HD 6300M reminds me a lot of the Mobility HD 5470. Let me take that opportunity to say that AMD's website is a pain to navigate.

Also note that the HD 6300M and 6500M have pretty loose specifications as far as clocks are concerned, or even memory type. In practice, the 6500M present in a laptop could be clocked at 500MHz with 900MHz DDR3, or at 650MHz with 900MHz GDDR5, with the same name!

This sort of thing creates a very confusing situation for consumers. When you can't trust the name of a SKU to reflect its generation, it's bad. When you can't even trust the name of a SKU to refer to one product with precise specifications, it's worse. The thing is, both AMD and NVIDIA do this because it works: it helps them sell more graphics cards. The press usually makes a couple of snide comments, but quickly moves on. Clearly, that's not enough to deter such behavior.

This is why I've decided to go on a little crusade of my own, in the hope that it will get AMD and NVIDIA to stop doing this. Obviously, there's no way I can succeed on my own, so I urge every member of the tech press to do the same: from now on, every single post about NVIDIA or AMD will be concluded with the following sentence, linking to this post.

Furthermore, I think that AMD and NVIDIA's renaming practices are dishonest and harmful to consumers, and that they need to stop.

Hey, it worked for Cato the Elder.

UPDATE: Dave Baumann chimed in here, and made the following comment: These support hardware accellerated MVC (Blu-Ray 3D) playback where Mobility Radeon HD 5000 didn't. And across the board HDMI 1.4a support.

I apperciate that, but I still don't think that the HD 6000 name is justified.

UPDATE 2: More information from Dave here: UVD2 has to be driven in a different way in order to get MVC decode and this requires a VBIOS update (or an SBIOS update in the cases of most notebooks) and additionally requires qualification by us and the vendor. HDMI 1.4a can be achieved by a driver upate (as it was on desktop Radeon HD 5000) but some notebook vendors still re-qual the software updates.

This pretty much confirms that we're dealing with the same chip.

Hans de Vries dissects Bulldozer

2010-11-27T17:07:00.000+01:00

For every major CPU release, Hans de Vries from Chip Architect takes a look at the die shot with his magic magnifying glass and tries to determine just which part does what. And as expected, he's done it with Bulldozer too. This time though, it was a little bit trickier than usual because AMD went through extra trouble and photoshopped the die shot, scaling parts up and down, blurring stuff, cutting and pasting components… The point of this was to make it difficult for Intel to draw any conclusive information from the picture.

But that wasn't enough to discourage Hans, and here's what he's been able to produce:

Sandy Bridge prices

2010-11-27T14:54:00.003+01:00

"Hello, I'm Sandy Bridge"

If you were wondering how much Sandy Bridge processors would cost, Expreview has the answser for you.

They have a nice little table chart, so take a look at it if you want all the details. With prices ranging from $64 for the Pentium G620 (2 Cores, 2.6GHz, 3MB of L3, no HyperThreading, no CPU Turbo) to $317 for the Core i7-2600K (4 cores, 8 threads, 3.4GHz with Turbo up to 3.8GHz) there's something for everyone.

However it's unfortunate that if you want the full chip with 4 cores, HT and Turbo, there's nothing below $294 (Core i7-2600). This is a clear sign of a lack of competition in this space. Hopefully, things will improve in Q2'11 with Bulldozer, but until then…

NVIDIA's Endless City demo on Radeons

2010-11-26T21:28:00.000+01:00

Remember NVIDIA's Endless City demo? Here is how NVIDIA describes it:

Take a cruise through the most complex city ever rendered in real-time. NVIDIA’s Endless City harnesses the horsepower of our incredible tessellation engine to procedurally generate urban detail never before possible in an interactive world. Sit back, relax and enjoy the view.

More here > Demos > Endless City.

When it was released, this demo wouldn't run on Radeons because it required CUDA, but it wasn't clear what CUDA was used for exactly. As it turns out, it's not used for anything at all, at least according to Scali's latest weblog post. He found out that you can disable CUDA as well as a vendor check, and the demo runs just fine on any DX11 card, though apparently not very fast. He's even uploaded a patch to make it easy for everyone.

New Sandy Bridge benchmarks!

2010-11-26T11:12:00.001+01:00

About three months ago, Anandtech published a performance preview for Sandy Bridge, well ahead of its launch. And now, Inpai has just joined the party with a preview of their own. It's in Chinese, but the charts speak for themselves.

Enjoy.

UPDATE: now it's also available in English, here.

NVIDIA Echelon

2010-11-25T20:18:00.000+01:00

Xbitlabs has a new piece about NVIDIA's Echelon, a research project investigating heterogeneous computing in future ExaFLOPS (10^18 FLOPS) systems.

NVIDIA isn't willing to share much more than a bunch of pretty slides with big numbers at this stage, but it's worth a look.

Sandy Bridge and low-end SKUs

2010-11-25T17:17:00.001+01:00


Sandy Bridge, in its quad-core version.

ComputerBase.de has managed to get a listing for future low-end Intel processors based on Sandy Bridge.

They have a nice little chart so I won't detail every single SKU, but I'll say this: first, Intel's naming scheme is still confusing as hell; second, even the lowly Pentium G620 has Turbo enabled for the graphics part (though not the CPU). This is both slightly unexpected and quite welcome, since that feature is key to Sandy Bridge's power efficiency. It's nice to see that even the bottom end isn't completely crippled. Though the CPU cores lack HyperThreading and Turbo, they still have a decent amount of L3 cache (3MB) and run at a respectable 2.6GHz, so I expect very respectable performance from this part.

Fudzilla also has some information about Sandy Bridge-based Celerons, but they don't seem to have the full specifications. Then again, it's supposed to be released in Q3'11, so those might not be set yet.

Good news about Llano

2010-11-24T01:40:00.000+01:00

Charlie Demerjian has just published a new article about Llano over at SemiAccurate. The short story is that even though it initially ran into some pretty bad trouble, it's now doing a lot better and might actually be released sooner that AMD has let on so far.

And the company is in a difficult competitive situation at the moment in the mobile market, so that's very good news for them.

Cayman specifications leaked

2010-11-22T21:45:00.003+01:00

It seems the Polish website FrazPC had a little mishap this morning. They mistakenly uploaded a lot of slides about Cayman, AMD's upcoming high-end GPU, apparently from a presentation recently given by AMD. I think the deadline was supposed to be today, so either FrazPC got the time wrong or the NDA changed. Either way, I—and others—had just enough time to save the relevant slides, so there you go:

So first of all, we can finally confirm that Cayman is based on a VLIW4 architecture. I've talked about it here, so I won't dwell on it much, let's just say that a VLIW4 unit should be almost as fast as a VLIW5 one, but smaller. We "know" from the recent Antilles leak that Cayman has 30 SIMDs, and here we can see that there is still one quad-TMU per SIMD, so that's 120 TMUs total, a 50% increase over Cypress (HD 5870)! Cayman looks like a real texturing beast.

Cayman also features what you might call a distributed geometry engine, similar to Fermi's, but more limited. Still, it can process two primitives per clock, and at 900MHz or more, that's over 1800 Mtri/s. That should be amply sufficient in even the most demanding games, but I can't help feeling that there could still be further improvements down the road. The bit about "off-chip buffer support for high-tessellation levels" sort of raises a red flag: it appears that Cayman can't handle high tessellation very well relying solely on on-die resources. Surely, being able to use an off-chip buffer is better than just choking on excessive information as Cypress seems to do in such cases, but it's not exactly ideal either. As usual, it's a trade-off, of course.

AMD promises tessellation performance at least 50% higher than that of Cypress, and 100+% higher with high tessellation factors. That's not nearly as good as Fermi, but I don't expect any game to reflect that.

As expected, the new VLIW4 units combine simple SPUs to handle transcendentals, occupying 3 slots, as I mentioned in my previous post. Again, thanks go to Gipsel for providing that information. AMD claims similar performance with a ~10% area reduction compared to Cypress, which is exactly what I had predicted. I wish I could say that it was anything more than a lucky guess, but I really can't.

Also note that when the slide says "2 64-bit MUL or ADD", that's a mistake, it really means "either two 64-bit ADDs or one 64-bit MUL". Still, with up to one 64-bit MAD or FMA per clock, Cayman achieves a DP rate of ¼, which isn't bad at all. Obviously, removing the non DP-capable T unit has helped. GPGPU folks should be happy about that, especially since there's more:

UPDATE: I hadn't even noticed, but the slide also says two 32-bit ADDs per cycle. That's obviously a mistake too, each VLIW unit is capable of four 32-bit ADDs per cycle. Once again, Gipsel was vigilant. This just goes to show that you shouldn't drink and make slides. ;-)

There are a few welcome improvements here. Exactly how the L2 cache will be used is unclear, however.

This table doesn't tell us much, but I took the liberty of adding some information, based on the recent leak about Antilles, and an educated guess for the memory bandwidth.

The next two slides introduce a new high-quality anti-aliasing mode…

…which Cayman should be able to handle just fine, thanks to seriously beefed-up ROPs:

But wait, there's more! AMD is introducing a new type of power management that they call "Power Containment":

Exactly how this works is far from clear, but AMD has apparently substantially increased granularity for power management, with regard to both time and functional blocks. This feature is claimed to be user-controllable through AMD's Overdrive utility, but I doubt it really affords all that much control. At least, you should be able to disable it, which could be useful for overclocking. Beyond that, I doubt there's much that Overdrive lets you modify.

Well, I wasn't there at the presentation, and I'm not sure what this slide is about exactly, but it seems to highlight the fact that power containment allows the GPU to always remain exactly within TDP, without having to scale clocks down any lower than necessary, and apparently helps with idle power too. I'd suggest waiting for comments from someone who was actually there, though.

All in all, when compared to Cypress, Cayman provides:

20% more SPUs, which are more efficient,
50% (slightly less capable) VLIW units,
50% more TMUs,
100% higher geometry throughput per clock,
significantly improved ROPs,
10% higher bandwidth, maybe more,
Higher clocks, most likely,
a few bits here and there…

Judging from all this, I'm going on record saying it should be faster than the GTX 580 when it is released some time next month. In the meantime, we'll just have to wait.

ISSCC 2011 and new information

2010-11-22T19:04:00.001+01:00

I'll keep it short this time. Dresdenboy has just published a new blog post on Citavia about the International Solid-State Circuits Conference 2011 (ISSCC) and you should read it because it contains new, juicy info. Here's a teaser:

4.5 Design Solutions for the Bulldozer 32nm SOI 2-Core Processor Module in an 8-Core CPU
T. Fischer, S. Arekapudi, E. Busta, C. Dietz, M. Golden, S. Hilker, A. Horiuchi, K. A. Hurd, D. Johnson, H. McIntyre, S. Naffziger, J. Vinh, J. White, K. Wilcox, AMD
The Bulldozer 2-core CPU module contains 213M transistors in an 11-metal layer 32nm high-k metalgate SOI CMOS process and is designed to operate from 0.8 to 1.3V. This micro-architecture improves performance and frequency while reducing area and power over a previous AMD x86-64 CPU in the same process. The design reduces the number of gates/cycle relative to prior designs, achieving 3.5GHz+ operation in an area (including 2MB L2 cache) of 30.9mm2.

And here are some figures about Sandy-Bridge, Westmere and Llano, for reference:


Author: Hans de Vries

As you can see, a Bulldozer module (2 cores) with 2MB of L2 cache is actually a bit smaller than 2 Llano cores with the same amount of cache! That's quite promising.

Intel releases SDK for OpenCL

2010-11-21T21:25:00.000+01:00

When Intel started talking about Sandy-Bridge, their upcoming CPU/GPU—or APU, if you will—architecture a while ago, they mentioned that it would be compatible with OpenCL, an open framework for parallel programming on a broad range of architectures, aimed at taking advantage of heterogeneous systems with traditional CPU cores and more parallel ones, for instance GPUs. OpenCL is managed by the Khronos Group, and backed by AMD, Apple, NVIDIA, and now Intel.

Indeed, considering their recent announcement regarding OpenCL and Sandy-Bridge, it should come as no surprise that they have just released their own SDK for OpenCL, albeit in an Alpha version. With Intel, AMD, Apple and NVIDIA actively supporting it, OpenCL now has potential to become the standard for parallel computing. Granted, NVIDIA would probably like you to use CUDA instead, but they will support any initiative that takes advantage of their GPUs for compute purposes.

The obvious advantage of OpenCL is that it's compatible with most widely-used architectures. That's not to say that you can just write your code once and have it run blissfully fast on all parallel processors, though. Unfortunately, some amount of tuning will always be necessary to extract performance from specific architectures, but at least, with OpenCL, you can do so using one language, sharing some code, and using one set of tools. As such, it's a huge improvement over having to use CUDA for NVIDIA, Brook+/CAL/CTM for ATI/AMD, and traditional programming languages for CPUs.

AMD Antilles specifications leaked

2010-11-21T20:54:00.001+01:00

There's a lot going on with AMD these days. An apparently genuine slide with some specifications for Antilles, or the AMD Radeon HD 6990, an upcoming dual-GPU card has just surfaced here.

This tells us that Cayman, the GPU on which Antilles is based, has at least (and probably exactly) 1920 SPUs. It also tells us that the GPUs in Antilles are clocked at 775MHz, since it can output 3100 million triangles per second, and Cayman is rumored to be able to produce 2 triangles per clock. It's the only way the 3100 Mtri/s figure makes sense anyway, so this rumor must be true.

Managing to put a pair of such large GPUs at 775MHz in a 300W card is quite impressive, and from that, we an infer that the single-GPU Radeon HD 6970 should be clocked at 900MHz or above. I would estimate its maximum power draw at about 225W, perhaps a tad more since it features a 6-pin and an 8-pin power connector, providing the card with up to 300W.

Bulldozer and Llano roadmaps leaked

2010-11-18T13:05:00.002+01:00

The guys from ATI-Forum.de managed to get their hands on roadmaps for the client version of Bulldozer, otherwise known as Zambezi, and for Llano (desktop only).

This is very much in line with what AMD revealed during Analyst Day: Llano will be out in mid-2011, available in 2-, 3- and 4-core versions. Apparently, only two different power envelopes will coexist, at least initially: 65W and 100W. I'm sure 45W products will follow in Q4. Obviously, mobile SKUs will have much lower TDPs.

We already knew that Bulldozer would make its way into desktops in Q2, but now it even seems to be around May, which is unexpected, but very good news. It will be offered with 8 cores first, and in 125W as well as 95W versions. Then, 6- and 4-core versions will follow, all within 95W. I wonder whether there will be 65W versions later.

It appears that AMD might be competitive in high-end desktops in the first half of 2011 after all, which is great because it hasn't been the case since 2006.