
|
|||||||||||||
|
|
||||||||||||

Verdict: Nvidia's first new high-end single GPU in nearly two years is the biggest GPU ever made - it finally brings high resolution Crysis to reality, but you pay a big price for its pace
The new GeForce GTX 200 is, Nvidia claims, the largest and most complex graphics processing unit (GPU) ever made. Featuring over 1.4 billion transistors, 240 stream processors and a 512-bit memory interface, it’s certainly a substantial piece of silicon (you can see how substantial in this video of us taking one apart).
Unlike the GeForce 9-series, where a change in the naming of a graphics card reflected very little change in the actual silicon, the GTX 200 GPUs are substantially different from any product Nvidia has launched before. You can read more about the change in the naming convention here, but in this article we’ll delve straight into the new GPU, its composition, characteristics and performance.
The GTX 200-series is launching in two flavours, the GTX 280 and the lesser GTX 260.
On paper, the differences between the GTX 280 and GTX 260 are quite pronounced, although since we’ve been told that GTX 260 cards will be delayed by a couple of weeks it’s not a difference we’ve been able to quantify with testing. The delay does seem strange given that both GPUs are clearly derived from the same design, so either yields of dies good enough to be GTX 280s are very high or Nvidia wants to push the high-end GTX 280 cards for a while before it lets the world see the performance and price difference between the GTX 280 and 260.
ARCHITECTURE ANALYSIS
Nvidia has stated that its architectural design goals with the GeForce GTX 200 GPU were to:
Some of these
design goals seem good targets for the engineers to set themselves – better
power efficiency, better DirectX 10 performance and the rebalancing of the
architecture are all laudable.
The first
design goal is rather spurious though – the GeForce 8800 GTX launched in
November 2006, so a GPU released 20 months later should of course be substantially
faster. It also ignores the fact that Nvidia has made faster graphics cards
since – the GeForce 9800 GX2, for example, although this admittedly uses two
GPUs to achieve its fast frame rates.
| GTX 280 |
GTX 260 |
9800 GX2 |
8800 GTX |
|
|---|---|---|---|---|
| GPU | GTX 280 |
GTX 260 |
2 x G92 |
G80 |
| Core speed |
602MHz | 576MHz | 600MHz | 575MHz |
| Stream processors |
240 | 192 | 2 x 128 | 128 |
| Stream processor speed |
1.296GHz | 1.242GHz | 1.5GHz | 1.35GHz |
| Memory | 1GB GDDR3 | 896MB GDDR3 | 2 x 512MB GDDR3 | 768MB GDDR3 |
| Memory speed |
1.107GHz (2.214GHz effective) | 999MHz (1.998GHz effective) | 1GHz (2GHz effective) |
900MHz (1.8GHz effective) |
| Memory interface |
512-bit | 448-bit | 2 x 256-bit |
384-bit |
| Memory bandwidth |
142GB/sec | 112GB/sec | 64GB/sec (per GPU) |
86.4GB/sec |
| ROPs |
32 | 28 | 2 x 16 |
24 |
| Texture units |
1.107GHz (2.214GHz) | 999MHz (1.998GHz) | 1GHz (2GHz effective) |
900MHz (1.8GHz effective) |
| Power inputs |
1 x 8-pin, 1 x 6-pin PCI-E |
2 x 6-pin PCI-E |
1 x 8-pin, 1 x 6-pin PCI-E |
2 x 6-pin PCI-E |
Take a look at the spec table and it isn’t clear immediately where Nvidia has ‘rebalanced’ the G80 architecture of the GeForce 8800 GTX when designing the GTX 280. In fact, it looks more as if Nvidia has just added more ‘stuff’ to the design – more stream processors, more memory, more memory bandwidth, more ROPs; more of everything.
It’s only when you look in more detail at how Nvidia has organised these resources that you get a feel for how its engineers have attempted to balance the component parts of the GTX 280. We’ll outline the major upgrades, which will give us a better understanding of how and why the GTX 280 performs as it does.
BRAND
NEW FEATURES
Despite GTX
200 being referred to by the company as Nvidia’s ‘second generation unified
architecture’ (in a recent briefing senior Nvidia representatives jokingly
called G90/G92 its Gen 1.5 unified architecture) the GTX 200 does not support
DirectX 10.1 as ATI’s Radeon HD 3000-series GPUs do.
Nvidia says that some features of DirectX 10.1 are already supported in its current architectures anyway (multisample readback, for example) while ‘key software development partners indicated that DirectX 10.1 was not important’, so Nvidia ignored it with the GTX 200.
Genuinely new features are actually few in the GTX 200 GPU – generally the improvements are just that: improvements over previous generation GPUs. For example, the GTX 200 series will support Nvidia PhysX for GPU-accelerated physics effects in games. However, Nvidia PhysX will also run on G80 and G90 GPUs, although probably not as well.
Nvidia is also claiming that the double-precision floating point units of the GTX 200 GPU as a new feature but these are actually just improvements on the single-precision floating point units of the G80 and G90 GPUs. This spec bump brings good benefits though – a GTX 200 series GPU can handle 128-bit floating point numbers (a 39-digit number which can include a decimal place) without the need to break them into two halves as with G80 and G90 GPUs. This allows greater speed when handling high-precision tasks such as 128-bit HDR with AA. More precision means more accurate colours, and the opportunity for a wider range of colour and light effects.
The GTX 200 GPU also has more floating point units than the G80 or G90 GPUs, again helping increase performance and speed.
To tick off the other new features, the GTX 200 now supports 10-bit colour depth processing and output, whereas G80 and G90 could only output in 8-bit colour depth. However, 10-bit colour output is only possible over DisplayPort, and you’ll only see the benefits if you also have a 10-bit TFT. There’s also dual-stream hardware acceleration so you can watch two HD streams in Picture-in-Picture mode.
The rest of the ‘new’ features are best explained as architectural upgrades and improvements, so let’s take a look at what the GPU has inside it.
UNIFIED
SHADER ARCHITECTURE BACKGROUND
Before moving
on, let’s clarify how the internals of a modern GPU are organised. Since the
GeForce 8-series, Nvidia’s GPUs have used a unified shader architecture, which
is very different from traditional GPU designs, which utilised a number of discrete pixel
and vertex shader units. These could only work on specific pieces of shader
code (i.e. pixel shader units couldn’t crunch vertex shader code). Having a
fixed approach meant that often the GPU didn’t have the resources a game
required, and couldn’t adapt to changing environments. Consider a typical RPG
such as Oblivion. If you’re in a cave, there’s not a lot of geometry work required
to create the environment, as the cave is relatively simple and there will be
only a few objects (such as a couple of goblins, perhaps a chest or two). To
make these objects look good, the GPU has to calculate lots of complex pixel shader
code such as HDR lighting effects, reflections and shinyness for slime on the
rocks and so on. However, when you go outside the cave, the balance of work changes:
with the draw distance on full, there’s more terrain to generate, plus a huge
amount of vegetation, all made up of vertices, so you need more vertex shader
power.
With a unified architecture, there’s no distinction between pixel and vertex pipelines. There are only stream processors, and each processor is capable of being dynamically allocated to vertex, pixel, geometry, or physics operations. The benefit is clear, since with a unified architecture, each part of the GPU can be kept busier for longer regardless of the type of scene being rendered. For example, instead of the vertex pipes lying largely idle when a 3D scene is geometrically simple, the stream processors can be reconfigured to work on whichever task the game throws at the GPU. The GPU’s dispatch and control logic dynamically assigns work to the stream processors, and this occurs automatically so that game developers don’t need to worry about it.
HOW THE SHADERS ARE ORGANISED
Inside a unified shader GPU you won’t just find a jumble of stream processors all eager to start
rendering your favourite game’s lovely graphics code. The resources of an
Nvidia GPU are organised into what Nvidia calls TPCs (Texture Processing
Clusters). We’ll call them ‘clusters’, because that’s a more user friendly word
than yet another TLA.
Each cluster is comprised of sub-units which Nvidia calls Streaming Multiprocessors (SM) and each SM has a setup unit to assign work, a handful of stream processors, a register, and a handful of texture units to handle texture-based tasks. Click here for a diagram of a cluster to see what we mean. Have it handy in another tab or window if you like.
A GeForce 9800 GTX has eight clusters which each have two SMs. Each SM has eight stream processors, so the GeForce 9800 GTX has 128 stream processors ((8 x 2) x 8 = 128)).
The GTX 200 has ten clusters, which each have three SMs. Again each SM has eight stream processors, so we can now see why the GTX 280 has 240 stream processors (10 x 3 x 8 = 240) while the GTX 260 has the odd-looking figure of only having 192 stream processors. Clearly the GTX 260 has two of its clusters disabled as 8 x 3 x 8 = 192.
Here’s where Nvidia justifies its claim that it has rebalanced the architecture. The number of texture units in a cluster (eight) has remained the same as in previous generations, while the amount of stream processors in a cluster has increased from 16 (i.e. 8 x 2) to 24 (i.e. 8 x 3). This, Nvidia says, reflects the needs of modern-day games which are using ever more demanding shader programs (which run on stream processors) but not more detailed textures.
The eight clusters of G80 GPU (GeForce 8800 GTX) each had, in addition to their stream processors, eight texture filter units and four texture address units. The GeForce G92 GPU (GeForce 9800 GTX) had eight texture filters and eight texture address units in each of its eight clusters. The GTX 200 keeps this equal balance of texture filter and texture address units (eight and eight) but, Nvidia claims, they’re more advanced.
While on the subject of clusters, we should mention their double-sized registers. This means that there’s twice as much room to store complex shader programs and other data within each cluster of SMs than there was before. This prevents the need to store lengthy shader programs in graphics memory and incur the time penalty of fetching it back into an SM every time it’s needed.
To round off the improvements at cluster level, the internal output buffer has been upsized by a factor of six over previous generations, which will help improve the performance of geometry shading and stream out. There’s also improved z-cull algorithms, allowing the GPU to drop unnecessary work earlier.
According to Nvidia, the GTX 200 drivers have also abeen coded with a more efficient communication protocol to aid data flow into the GPU. Once data has been stuffed into the GPU, Nvidia says that the GTX 200 has better instruction scheduling, better instruction issue and better register allocation than its previous GPUs. The thread dispatch engine can therefore flood the GPU with work to ‘close to theoretical peak performance’, and it’s 22% more efficient than the same unit of the G90.
BACK END AND MEMORY
The GTX 280
has 32 ROPs to handle the output from its clusters, compile the final frame and
apply AA. These ROPs are referred to as being ‘full-speed’ while the ROPs of
the G80, for example, ran at ‘half-speed’. A G80, with its 24 ROPs could output
24 pixels per clock to the frame buffer and blend only 12 pixels per clock. The
GTX 280 can output and blend 32 pixels per clock.
The GTX 280 uses a massive 512-bit wide memory interface, bigger than the 384-bit wide interface the GeForce 8800 GTX used, and double that of the Radeon HD 3870. It comprises eight 64-bit interface units (again, this explains the odd 448-bit memory interface of the GTX 260 – clearly this GPU has one of its memory interface units disabled). This is paired with 1GB of GDDR3 memory running at 1,107MHz (2,214MHz effective) – this is an odd number too, but Nvidia does say that the memory interface units of the GTX 200 are rated up to 1.1GHz, so perhaps this is as fast as memory will go with a GTX 280. Either way, the GTX 280 has incredibly high memory bandwidth.
Nvidia says it’s also upgraded the memory interface units of the GTX 200, with improved memory access patterns, improved caching algorithms and additional compression hardware. The latter compresses textures to reduce memory and memory bandwidth load.
TESTING AND RESULTS
With more
stream processors than previous GPUs, plus very high memory bandwidth, the GTX
280 should cope very well with high-resolution gaming and plenty of
AA. BFG and MSI both sent us GeForce GTX 280 cards for testing, and we wanted
to find out how much faster (if at all) the new card was than an Asus GeForce
9800 GX2 TOP, an overclocked version of Nvidia’s previous high-end GPU.
Both the BFG
and MSI should cost around £430 (prices have yet to be confirmed, expect an
update later today) while the Asus can be bought from Tekheads for £362.
Click here
for the benchmark results (opens in a new window).
For Age of Conan,
we raised all the view distance bars to maximum in order to fully stress the
graphics cards on test. The 9800 GX2 didn’t put in a bad performance, with
average frame rates considerably higher than the GTX 280 at both 1,680 x 1,050
and 1,920 x 1,200, but the minimum frame rates were very low. The GTX 280 offered a
far more consistent experience, with high minimum frame rates – 33fps at 1,920
x 1,200 – which meant no stutter in the game. The GeForce 9800 GX2 just
couldn’t cope with all the texture data flowing around with these draw
distances set so high. We are mindful that Conan is a new game however, and the
fact that the GeForce 9800 GX2 has to use SLI (as it’s a dual-GPU card) could
mean that the SLI profile for Conan isn’t up to scratch at the moment. This is
an issue we’ll return to later.
Crysis again showed that one massive chip such as the GTX 280 has many advantages over a dual-GPU product such as the 9800 GX2. The new GTX 280 could just about get away with playing the game at 1,920 x 1,200 and 4x AA – the native resolution of a 24in TFT – and that’s with all the detail settings on high, at which the game looks tremendous.
We wanted to see whether the GTX 280 could play Crysis
at the ‘very high’ settings available in DirectX 10 mode, and indeed it could,
albeit only at a much lower resolution. At 1,280 x 1,024 with 2x AA the GeForce 9800 GX2 proved the better card, as it ran the game with a minimum frame rate of 24fps and 34fps
average. At the same settings, the GeForce GTX 280 could only manage a minimum of 22fps
and an average of 31fps. Interestingly, disabling AA helped the GX2 but hardly
improved the GTX 280’s scores at all. At ‘very high’ settings Crysis looks
absolutely incredible – smoke hangs in the air after a firefight, light picks
through the trees and prickles the grass and objects such as weapons look lethally
realistic. The scenes look incredibly tangible, and if you’ve got the money for
a high-end graphics card, you’re in for a treat.
Call of Duty
4 is a fairly easy game for a high-end graphics card to run, but it’s also
highly optimised for multi-GPU setups. The GeForce 9800 GX2 is faster than the
GTX 280 in every test resolution by a noticeable degree.
Company of
Heroes: Opposing Fronts proved an interesting game to test. In DirectX 9 mode
the GTX 280 shades the 9800 GX2, especially as the resolution increases and the
massive amounts of memory bandwidth of the new GPU come into play. Switching to
DirectX 10 mode sees the GTX 280 pound the GeForce 9800 GX2 on minimum frame
rates - it just can’t keep the game data flowing quickly enough. The average
frame rates of the 9800 GX2 are good, but stutters are clearly visible when
playing the game, hence the very low minimum frame rates at every test
resolution.
Race Driver:
GRID proved another interesting test game, as the SLI profile was clearly not
up to scratch. If we wanted to run our benchmark more than once we had to exit
the game entirely or else it would crash. Nvidia did send out an SLI profile
update (unbidden as well, much to its credit) but this only improved stability
slightly and did nothing for the frame rate. GRID clearly therefore favours the
GTX 280 with high minimum and average frame rates for this card.
The 3DMark06
test was run for reference purposes, and we couldn’t get 3DMark Vantage to work
at all. That the GeForce 9800 GX2 outperformed the GTX 280 in 3DMark06 isn’t
too surprising as the GX2 has more mature drivers, but it’s still slightly
worrying for the new GPU. We believe the GTX 280 would have scored more highly
in Vantage as is has long shader programs which the double-size registers of
the GTX 280 loves.
NOISE, HEAT AND POWER
We should
also point out that the GeForce GTX 280 becomes incredibly loud as soon as you
wave a game engine anywhere near it, with the fan blowing a gale of hot air out
of the back of the dual-slot cooler. We also experienced some texture shimmer
in Crysis as we hadn’t used enough cooling on the back of the card. The rear
plate of the cooler acts as a heatsink for half of the memory, and needs a good
amount of airflow.
The GTX 280
has high power requirements too. Nvidia
recommends a 550W PSU capable of providing 40A at 12V for a single-card system and
doesn’t quote how much power an SLI or 3-Way SLI system will require.
CONCLUSION
When the GeForce 8800 GTX first came out, it was obviously head and shoulders above everything else and so we thoroughly endorsed it even though it was quite pricey. The GeForce GTX 280 is a much harder call. On one hand, it is actually outclassed in some games when it comes to average frame rates by a current graphics card - Nvidia's GeForce 9800 GX2. However, as a single GPU card, the performance of the GTX 280 is unmatched.
The GTX 280 has four things going for it –
huge memory bandwidth, a newer architecture, the fact that it’s a single
GPU card - and finally the fact that from what we understand, the GeForce 9800 GX2 is now end-of-life, and won't be available to buy in the very near future. The massive memory bandwidth and revised architecture should give the GTX 280 good
longevity, while the fact that it’s a single GPU will save you from SLI-related
troubles and teething issues with games that haven’t got an optimised SLI
profile yet. The fact the 9800 GX2 is going to disappear from the market makes your choice simpler, too. The only question mark is the as yet untested GeForce GTX 260, which is expected to be a lot cheaper, so it could be better value - but correspondingly, it does also give up a lot of power to the GTX 280. There is also ATI's new Radeon architecture, the HD 4000-series, waiting in the wings. Samples and reviews should be available in the next few weeks, but for the time being, the GTX 280 is the highest performance graphics card on the market. It doesn't completely blow away the 9800 GX2, but it is a step forward from a single 8800 GTX - it makes playing Crysis at high resolutions a reality and can take new games such as GRID and smoothly deal with them at incredibly high settings such as 2,560 x 1,600. This should be tempered against the noise and heat it makes while doing this job, and the fact that it is as costly as it is fast.
Thanks to BFG and MSI for supplying us with cards, and to Phil Hartup for help with the testing.
Test kit: 3.2GHz Intel
Core 2 Extreme QX9770 overclocked to 3.6GHz, Asus Striker II Extreme
motherboard, 4GB Corsair XMS DDR3 memory at 1,600MHz, 640GB Western Digital
Caviar SE16 hard disk, Windows Vista Ultimate 64-bit, GeForce 9800 GX2:
ForceWare 175.16, GeForce GTX 280: ForceWare 177.34
User Reviews
Smooth framerates at last
"Worth the dosh"
Just got one from Novatech for £269 (cheapest ive seen) and it flies. I had a 8800 ultra before which was great but had problems with crysis and call of duty 4, Put this in and tried it and everything runs smoothly. My 3d mark 06 score was 14,100 roughly with the 8800 and with this card it's up to around 17,500. doesnt sound much but it makes a big difference in games + i paid £130 less for this card than i did for the 8800
Review by: davelister
Average User Rating:
95%
Fastest, cheapest 3G mobile broadband dongles from 3, Vodafone, T-Mobile and Orange
from just £10/month