Welcome Guest LOGIN | REGISTER
Monday 23rd July 2007

Dissecting DirectX 10

Posted at: Monday 23rd July 2007 by Stuart Andrews

Stuart Andrews takes a journey through a DirectX 10 3D graphics pipeline, and explains how GPU architecture has changed since the DirectX 9 days.

'The critical thing,' he explains, 'is to look at the typical workload that comes through for games now, and also for the games you believe you'll see over the next year, and then figure out how much of the workload is straightforward vector work, how much is basic scalar work and how much is difficult scalar work.' ATi's engineers believed that only 15-20 per cent of the workload sat in the latter category, so they allocated those jobs to one in five of the ALUs.

Nvidia, however, has a slightly different take, hiving off its 'special function units (SFUs)' from the main stream processors (SPs) and pairing them, with 16 in each cluster. However, as Nvidia's SFU logic also functions for attribute interpolation, these SFUs only function at roughly a quarter of the speed of the main SPs. 'We optimise the design of our standard SPs for the highest performance with the most commonly executed instructions,' says Nvidia's director of technical marketing, Nick Stam, 'without including a bunch of additional logic for special cases, such as more complex instructions that don't need to be executed as frequently. This is a key reason as to why we have separate units.'

It's also worth mentioning that Nvidia and ATi have equally divergent approaches on the clock speed of their ALUs. ATi's approach is pretty straightforward: simply pack them in en masse and run them at the same clock speed as the rest of the GPU. Nvidia's approach, however, uses fewer SPs, but runs them at double the GPU clock speed. AMD's Richard Huddy admits this is a perfectly valid choice. 'If you can genuinely double-clock the ALUs inside the engine, you'd be tempted to have half as many.' There's a risk of latency between the two differently clocked domains, but it isn't a fatal issue, and it allows Nvidia to use a smaller, more power-efficient architecture.

What's more, there's also instruction width and granularity to consider. Each shader unit in the G80 can handle two instructions per clock against the R600's six, or process 32 pixels per clock against the R600's 64. However, while this might seem like a bad thing, it also means that the G80's scheduling hardware has less work finding and scheduling independent instructions together in order to maintain optimal efficiency. 'The G80 is essentially a single instruction-issue architecture,' says Stam, 'which gives it full efficiency, irrespective of the vector size or dependencies in the code stream. For the R600, performance is a function of the compiler's ability to sniff out six independent instructions per cycle - a really tough job!' This may be one reason, among others, for Nvidia's high-end 8800GTX outperforming ATi's HD 2900XT, even though the latter has two and a half times as many shader units.

Meet the Management

To manage the workers, ATi still uses a dispatch processor, as seen in the R580. However, in the R600, the 'line manager' has a more difficult job. While the R580's dispatch processor simply had to schedule various pixel-shader duties, and swap them in and out to optimise the workflow, the R600's dispatch processor has to schedule three separate queues of pixel, vertex and geometry work across the same bank of ALUs, making use of their ability to switch between pixel and vertex, or vertex and geometry work within a single clock cycle. Meanwhile, Nvidia's G80 uses a combination of a main global scheduler and a series of local schedulers (one per cluster) to perform basically the same task.

More images for this article:

Submit to:  
Hands On Guides for this article
Comments

Its Stupid to make every body in the world upgrade to vista eventually and even more anoying is for us gamers and PC modders i think its all just a waste of time

Comment by Maddwilz at 7:56pm 10th August 2007



WAGHHHHHH VISTA NO WORKEEEE

theres talk about microsoft doing a turn around and offerin dx10 as an update coz nae body wants 2 spend a fortune on upgrade just like me lol im going back 2 my amiga 1200 and wipeout 2097 yassssssssssss

Comment by GUMBANATOR at 7:23pm 5th August 2007



A Vista Work Around

If you want all the Dx10 benefits in Company of Heroes without getting a Vista machine here is what you do. Take out half your RAM, underclock your CPU to about 75% of the speed and then replace your graphics card with an X1300 or similar. That should nicely replicate the prolapsed frame rate and compromised graphics settings enjoyed by Dx10 Company of Heroes players (unless they have beta drivers).

Comment by Grotmonkey at 9:46pm 31st July 2007



Do I really need Vista for dx10?

I REALLY don't want to buy vista just to play a DX10 game. Anyone know anyway around it?

Comment by clipkilla at 6:04pm 30th July 2007



Make a Comment

Mobile Broadband

Compare prices

Fastest, cheapest 3G mobile broadband dongles from 3, Vodafone, T-Mobile and Orange
from just £10/month

Button link to Mobile Broadbandgenie.co.uk
Powered by
Broadband Genie