Stuart Andrews takes a journey through a DirectX 10 3D graphics pipeline, and explains how GPU architecture has changed since the DirectX 9 days.
DirectX 10 architecture is confusing. You aren't just sending instructions through a pipeline, but also circulating them through a complex interconnected system. Vertex data processed through the vertex portion of the setup engine, the dispatch processor and the ALUs will end up back at the setup engine, ready to be rasterized and sent through the pixel portion of the setup engine, then on to the dispatch processor and the ALUs. However, using this heavily threaded, unified approach offers a huge advantage.
In a DirectX 9 GPU, everything ran smoothly, providing the pixel and vertex work matched the purpose for which the architects had designed the GPU. If a game running on an R580 was performing about six times as much work with the pixel engines as it was with the vertex engines, then 100 per cent of the shader units were being used and everything was running at full speed. If, however, the pixel engines were performing ten times the work of the vertex engines, the latter would be doing nothing, while the former became clogged up. Worse, if the vertex engines had the same workload as the pixel shaders, they would be overloaded, while the pixel engines were starved of data.
This doesn't happen with a unified architecture. Providing the command processor, setup engine and dispatch processor (or the setup engine and global/local schedulers in the G80) work properly, all the units should be busy all the time. In the words of ATi's Richard Huddy, 'a piece of hardware that used to occasionally dip well below 50 per cent efficiency because it was just one part of the pipeline, and even that wasn't used 100 per cent efficiently' has become one 'where it's hard to see any part of the chip ever spending available work cycles not doing anything.'
Got all that? Good. Now let's follow a typical 3D workload through the various departments of a DirectX 10 GPU.
The Front End
The first thing that graphics data exiting the CPU hits is the GPU's 'front end' - the block devoted to organising instructions and incoming data, and then streaming it to the rest of the GPU. In a DirectX 9 GPU such as the R580, the incoming vertex data pretty much went straight to the vertex shaders for processing; in the R600, however, it arrives at the command processor. This checks that the hardware is correctly configured for the task ahead - which has shifted from CPU to GPU for bandwidth and latency reasons - then batches incoming instructions into sensible chunks in order to minimise latency later on. Nvidia has employed equivalent logic in the G80. This batched data hits a global setup engine, which looks at the incoming instructions and separates them into three queues: pixel, geometry and vertex work.
Vertex Setup
For now, let's concentrate on the latter. The command processor takes the vertex data coming from the CPU and passes it to the vertex assembler, ready for vertex shader operations in the ALUs. In the R600, it also passes through the Tesselator beforehand. The vertex shaders then carry out their work, and shunt the data off to one side, where instructions from the setup engine can use it for the next stage; alternatively, they can use the 'stream out' cache, so that it can be used by other vertex or geometry shaders. For example, by combining 'stream out' and a geometry shader, the vertex data can be used to generate a displacement map and create surface relief, or build fully 3D volumetric shadows.
Its Stupid to make every body in the world upgrade to vista eventually and even more anoying is for us gamers and PC modders i think its all just a waste of time
theres talk about microsoft doing a turn around and offerin dx10 as an update coz nae body wants 2 spend a fortune on upgrade just like me lol im going back 2 my amiga 1200 and wipeout 2097 yassssssssssss
If you want all the Dx10 benefits in Company of Heroes without getting a Vista machine here is what you do. Take out half your RAM, underclock your CPU to about 75% of the speed and then replace your graphics card with an X1300 or similar. That should nicely replicate the prolapsed frame rate and compromised graphics settings enjoyed by Dx10 Company of Heroes players (unless they have beta drivers).
I REALLY don't want to buy vista just to play a DX10 game. Anyone know anyway around it?
Make a Comment
Fastest, cheapest 3G mobile broadband dongles from 3, Vodafone, T-Mobile and Orange
from just £10/month