- Review Cost: £425
In retrospect, the launch of the G80-based Nvidia GeForce 8800 GTX in November 2006 marked a paradigm change in the field of computer graphics. Not only was it the first DirectX 10 graphics card, but it also absolutely destroyed the DirectX 9 competition before Windows Vista and continued to hold this advantage long after Vista and DirectX 10 had been released. The G80 in its numerous guises didn’t actually have any significant competition until AMD debuted the ATI Radeon HD 3870 X2, which utilized two RV670 cores on one card, in February of this year.
Not that the contest went on for very long. Within a short period of time, Nvidia released the dual G92-based 9800 GX2, which used a similar two-chips-on-one-card design to the HD 3870 X2 and easily reclaimed the top performance title, at least in the games it supported. The 9800 GTX, which Nvidia released as a follow-up, used a single G92-based chip to slightly increase Nvidia’s performance advantage in the single-chip graphics card market. Of course, ATI continued to produce many excellent cards and engaged in fierce competition in the popular sub-£150 market, but it was unable to take the top spot.
With the 9800 series cards, Nvidia did manage to maintain its lead, but it didn’t exactly break new ground. Although new features like HybridPower are useful, the overall range felt a little underwhelming despite the performance is good but not outstanding.
However, only a few months later, Nvidia has just unveiled the GT200, a brand-new graphics processor that, at least on paper, appears to have all the capabilities required to be a viable replacement for the G80. It is an absolute monster, squeezing in 1.4 billion (yes, billion with a “B”) transistors, 240 stream processors, 32 ROPs, a 512-bit memory interface, and a slew of other under-the-hood upgrades. It is indeed a behemoth both inside and out, measuring 24 x 24mm, the largest single die TSMC has ever commercially produced. It is still made using the same 65nm process as the G92.
Indeed, there is only enough room to produce 94 GT200 chips on each typical production silicon wafer, which has a diameter of 300 mm. You can get an idea of how big and expensive GT200 is by comparing it to something like Intel’s Conroe CPUs, which use the same manufacturing process but have a smaller die size of only 143mm2, allowing 426 dies to be produced per wafer.
The GT200 will come in two variations at launch, and these will be the first components to use Nvidia’s new branding. The new cards are now known as GTX 280 and GTX 260 instead of the familiar x000 GT/GTX/GTS configuration, which is all that has changed as a result of the rebranding.
With its shader speed running at 1296MHz, 1GB of GDDR3 memory operating at 1107MHz (effectively 2.2GHz), and the remainder of the processor chugging along at 602MHz, the GTX 280 will utilize the full capabilities of the GT200. The GTX 280 will need both an additional eight-pin PCI-Express connector and an additional six-pin PCI-Express connector due to the 236W power consumption for the entire lot.
When the GTX 260 is released on June 26 (the GTX 280 will be out by the time you read this), two SM clusters will be disabled (more on this on the following page), and one ROP partition will also be gone. 192 shaders are operating at 1242 MHz, 28 ROPs are operating at 576 MHz, and 896 MB of GDDR3 1000 MHz memory are the result of the clock speeds being throttled in addition to this. Due to these speed and component reductions, the GTX 260 will consume less power—183W to be exact—and as a result, only one six-pin PCI-Express power connector is required.
With the GTX 280 asking for £449 and the GTX 260 asking for £299, list prices are as outrageous as you might expect. Furthermore, early indications indicate that the pricing won’t be significantly changed by retailers offering discounts or competing on price. Hey, whatever.
But first, let’s look at what drives Nvidia’s newest graphical marvel. We’ll take a closer look at the GTX260 in a separate article, and we’ll poke around the physical GTX280 card shortly.
Although the design of the GT200 shares many characteristics with the G80, numerous improvements have been made to this new core, making it a far superior overall product. Let’s return to the fundamentals first, though, before we go too far with the comparisons.
A Streaming Multiprocessor, or SM as Nvidia calls it, is the following:
As seen in the image, an SM consists of a number of so-called Streaming Processors (SPs), along with an instruction scheduler and some cache memory. We won’t delve too deeply into the details, but there is a little more to it than that, including two Special Function Units (SFU) that aren’t shown. This tiny group essentially functions as a little eight-core CPU, with each core handling the calculations needed for a single pixel. Only the precise data related to the eight pixels that the SM is presently operating on are handled by the small (16KB) piece of memory. The G80, G92, and GT200 all use the same basic building block of Nvidia’s unified shader architecture.
“TPC from GT200” (center) (/center)
The first significant separating line between the GT200 and the G80/G92 may be seen if we zoom out one step. A Texture/Processor Cluster (TPC) is made up of three of these SMs in GT200, as opposed to G80 and G92, which used two SMs for each TPC. The new chip has 32 SPs as opposed to the G80’s 24 SPs, thus the idea is the same.
“TPC from G80/G92” (center) (/center)
A TPC not only combines the SMs but also adds texture processing capabilities, which is another area in which the new chip somewhat differs from its predecessors. With G92, the number of address units doubled to eight while filtering remained at eight units, compared to G80’s four texture address units and eight texture filtering units per TPC. The situation has, uh, stayed the same with GT200.
The amount of shaders to texturing units has altered, however, each TPC still has eight texture address units and eight filtering units. Therefore, even though each TPC’s shader count has increased by 50%, it still has the same amount of texturing power. The change in the ratio may initially appear to be a step backward, but most contemporary games are becoming more and more dependent on shader technology. Furthermore, if you take a broader view, you’ll notice that the GT200’s overall texturing power has actually marginally increased.
What more is there to say about Counter-Strike: Source than has already been said? Four years after its initial release, it is still one of the most played games in its genre and is simply “the” benchmark for team-based online shooters. It focuses on tiny environments and incredibly intense small-scale battles where one-shot kills are the norm, completely in contrast to Enemy Territory: Quake Wars. This game is the best way to test every aspect of your first-person shooting abilities at once.
Using a custom time demo recorded during a match on the cs militia map versus bots, we test the 32-bit version of the game. This is one of the most graphically demanding maps accessible because of the amount of foliage present, which significantly affects image quality and performance. Since this game heavily depends on quick, accurate reactions, we find that a framerate of at least 60 frames per second is necessary for serious gaming. Dropped frames are simply unacceptable in this game.
We test with 0xAA 0xAF, 2xAA 4xAF, and 4xAA 8xAA while all in-game settings are at their highest. Anti-aliasing for transparency can also be manually enabled through the driver, but this is obviously only possible when normal AA is used in-game.
Although ATI’s HD 3870 X2 does a fair job of keeping up with GTX280, it can’t compete with Nvidia’s most recent when it comes to performance. The GTX280 is still in charge.
One of our favorite games from the previous year has to be Call of Duty 4. It demonstrated that first-person shooters didn’t require the best graphics or the longest game time. It also brought the Call of Duty brand up to date. You were on edge nonstop for the entire eight hours of pure adrenaline rush.
FRAPS is used to record framerates as we manually play through a brief segment of the game’s second level while testing using the 32-bit version of the game patched to version 1.4. We find that a frame rate of 30 fps is more than adequate because, despite the intense atmosphere, the gameplay is less intense and doesn’t depend on quick decisions and swift movement.
We test with 0xAA and 4xAF, and all in-game options are set to their highest possible setting. Anti-aliasing for transparency can also be manually activated through the driver, but this is obviously only possible when standard AA is utilized in-game.
For some reason, the multiple enhancements made to GT200 simply didn’t help this game all that much, and the 9800 GX2 completely destroys it. Since Call of Duty 4 is a DirectX 9-based game, we hypothesize that it benefits little from the more DirectX 10–focused adjustments made to GT200. But it still functions flawlessly.
The fact that Enemy Territory: Quake Wars employs the free and open-source OpenGL API rather than Microsoft’s DirectX sets it apart from all of our other tests. It is a cooperative first-person shooter that takes place during a dystopian future battle. You can choose from a huge variety of character types and playing styles, as well as a huge selection of vehicles, as a player. Battles can take place over sizable open areas and include a lot of participants. Overall, it’s a multiplayer paradise.
The game’s 32-bit version, which has been patched to version 1.4, is used for testing. We use a bespoke time demo from the Valley level, which we believe to be the game’s most graphically demanding level. We believe this game needs a framerate of at least 50 fps due to the intense multiplayer action and quick mouse movement.
We test with 0xAA 0xAF, 2xAA 4xAF, and 4xAA 8xAA while all in-game settings are at their highest. Anti-aliasing for transparency can also be manually activated through the driver, but this is obviously only possible when standard AA is utilized in-game.
The good news keeps coming with Enemy Territory, where the GTX 280 completely dominates the competition. Even the 9800 GX2 frequently has to take second place, and SLI scales this game really nicely.
The newest game in our testing library, Race Driver: GRID, is also one of our current favorites. It’s a fantastic pick-up and play-driving game because it combines arcade-style thrills and spills with a healthy dose of realism and extras like Flashback. It has stunning lighting, interactive crowds, destructible environments, and beautifully rendered settings that make for stunning visuals. Additionally, it is not the game with the highest hardware requirements.
The 32-bit, unpatched, DirectX10-capable version of the game is used for testing. In a Pro Tuned race on regular difficulty, we manually finish one circuit of the Okutama Grand Circuit while using FRAPS to record frame rates. We find that to play this game satisfactorily, a framerate of at least 40 fps is necessary because severe stutters can sabotage your timing and accuracy. The track, barriers, and car bodies all suffer from significant aliasing and are a constant source of distraction, so we’d also recommend 4xAA as a minimum.
We test with 0xAA, 4xAA, and 8xAA while keeping all in-game settings at their highest possible setting. Anti-aliasing for transparency can also be manually activated through the driver, but this is obviously only possible when standard AA is utilized in-game.
Only a few cards produced outcomes that were comparable because we had very little amount of time to evaluate this game. The 9800 GX2 struggles more than it should, indicating that Nvidia hasn’t had time to optimize its SLI drivers for this game. Overall, we are unable to remark on the big picture. Nevertheless, it is abundantly evident from the short testing that the GTX 280 is a fantastic option for playing this game.
The graphical fidelity of Crysis is still unmatched, making it the ultimate test for a graphics card even though it hasn’t been a huge commercial success and its gameplay isn’t particularly innovative. This game has all the eye candy you could ask for and then some, with its abundance of dynamic foliage, rolling mountain ranges, vivid blue seas, and large explosions.
The game’s 32-bit version, patched to version 1.1, is used for testing while operating in DirectX 10 mode. We use a unique time demo from the game’s opening minutes when the player is exploring the beach. Surprisingly, we discover that any frame rate above 30fps is generally adequate to play this game given its cramped setting and graphically rich environment.
For our test runs, all in-game options are set to high, and we test with both 0xAA and 4xAA. Anti-aliasing for transparency can also be manually activated through the driver, but this is obviously only possible when standard AA is utilized in-game.
Really, there isn’t much to say. The absolute best option for Crysis gaming is the GTX280. In fact, it’s the first card that we’ve seen, and it allows us to play this game at 2,560 by 1,600. The beginning is excellent.
As I hinted at earlier, Nvidia has been heavily promoting GPGPU at the same time as the GTX280’s launch. While this may be intriguing, it is still unlikely to be of much importance to anyone looking to purchase one of these cards. Therefore, we won’t consider the analysis of GPGPU performance until the entire GPGPU spectrum is balanced and some sort of standards are in place. Therefore, let’s start those games instead.
”’ Common System Components”’
* Intel Core 2 Quad QX9770
* Asus P5E3
* 2GB Corsair TWIN3X2048-1333C9 DDR3
* 150GB Western Digital Raptor
* Microsoft Windows Vista Home Premium 32-bit
* GTX280: Forceware 177.34
* Other nVidia cards: Forceware 175.16
* ATI: Catalyst 8.4
”’ Cards Tested”’
* nVidia GeForce GTX 280
* nVidia GeForce 9800 GTX
* nVidia GeForce 9800 GX2
* nVidia GeForce 8800 GTX
* nVidia GeForce 8800 GTS 512
* ATI HD 3870 X2
”’ Games Tested”’
* Race Driver: GRID
* Enemy Territory: Quake Wars
* Call of Duty 4
* Counter-Strike: Source
With the exception of the Zotac sticker, the first card we received for review has the exact same design as Nvidia’s reference board, so that is what we will be evaluating it against. When we conduct a roundup in the near future, we will discuss the specifics of the Zotac board as well as a number of other partner cards.
The GTX280 card is around the same length as the 9800 GX2 at 267mm. It is totally encased in a metal shroud, just like the GX2. This safeguards sensitive electronics from potential harm from static or general bumps and scrapes, a development we warmly embrace.
Like all of Nvidia’s most recent high-end cards, the GTX280 has a dual-slot heatsink/fan design that takes advantage of the 8800 GTS 512-era, slightly off-parallel fan alignment. The cooler performs admirably, as we have come to expect. It is very silent when not in use, and while it does become audible while under load, the sound is a soothing whoosh rather than an obnoxious buzz or high-pitched shriek. Although the card does get very hot and will need a case with good ventilation to prevent stability issues, this is something we would fully anticipate from a high-end graphics card.
The maximum power drain, as previously indicated, is a massive 236W. This is a worst-case scenario, though, and Nvidia has taken excellent methods to conserve energy. As a result, idle power is only 25W and power usage during accelerated video playback is only 32W. Given that the chipsets that support this power-saving function themselves use a lot of power, these are some quite amazing numbers that definitely make you question the benefits of hybrid power.
Even though the card “can” draw very little power, it still won’t function without both auxiliary PCI-Express power sockets being properly connected. An LED on the expansion bracket will indicate whether the card has enough power by glowing red if it does. The glowing PCI-Express sockets that Nvidia used on the GX2 are not currently being used by Nvidia, but they were more of a “bling” feature than a functional requirement.
The SLI connectors and a S/PDIF socket are covered by rubber covers on the card’s top. Dual- and triple-SLI combinations are made possible by the first, while digital audio can now be output over the video connections thanks to the second. This supports DTS 5.1 up to 96KHz, six-channel Dolby Digital up to 48KHz, and two-channel LPCM up to 192KHz. Although eight-channel LPCM, Dolby TrueHD, and DTS Master Audio are glaring omissions, it is still sufficient for all but the most complex home theatre setups. To use this, a DVI-to-HDMI adapter is offered.
Two dual-link DVI-I connectors and a seven-pin analog video connector serve as the standard outputs, and the latter supports S-Video natively as well as composite and component via a break-out dongle. Both DVI connections are compatible with HDCP encryption and can be used to playback HD content that is copy-protected, such as Blu-ray discs.
H.264, VC-1, and MPEG-2 all experience the same level of video acceleration as the 9000-series, thanks to GPU acceleration. Additionally, there are recently introduced enhancements for blue, green, and skin tone, as well as dubious post-processing techniques for images, dynamic contrast enhancement, and enhancements.
One of Nvidia’s major marketing initiatives this year has focused on increasing awareness of the General Purpose Graphics Processing Unit (GPGPU), which uses a GPU to perform computation unrelated to 3D graphics. Therefore, Nvidia was eager to highlight the superior GPGPU capabilities of its most recent chip with the GT200 launch.
Because all those shaders can also be used as mini CPUs, GPUs in general are perfect for carrying out parallel computing tasks like image processing and video conversion. When you have 240 of them, as you do with GT200, the sheer force of the shaders will easily outperform any CPU. Individually, they may be inferior to a proper CPU. The main issue right now is how challenging it is to write software that can take advantage of parallel processing and especially parallel processing on a GPU. This is what inspired nVidia to begin developing its CUDA Software Development Kit (SDK), which Hugo recently discussed and which greatly simplifies programming for GPGPU for the coder.
However, Nvidia has also recently acquired PhysX, a provider of physics processors, and integrated its technologies into the CUDA SDK, in addition to CUDA as a general programming platform. As a result, both lifelike physical effects and lifelike visual effects can now be produced using Nvidia GPUs.
Additionally, since the 8800 GTX, all of Nvidia’s GPUs have supported CUDA, giving it a massive installed user base of 70 million people. With Adobe using GPU acceleration in its upcoming releases of Photoshop and Premier, this has caused some pretty well-known developers to sit up and take notice of CUDA.
Of course, AMD has been working on its own CUDA rival as well, known as the Close To Metal (CTM) SDK. The uptake of this, however, has been somewhat less enthusiastic. The true state of play with regard to GPGPU is still very uncertain, and for the time being, I’d advise taking the whole thing with a grain of salt because AMD also plans to support the Havok physics engine while delaying its adoption of PhysX. For those who are still interested, the GT200 has significantly improved upon earlier nVidia efforts.
With reference to GPGPU, the GT200 is a chip that has the following appearance. The TPCs are transformed into miniature 24-core CPUs, each with a separate cache memory storage unit. The massive compute load is split up among all the different TPCs by a thread scheduler, and the frame buffer memory serves as the main system memory.
Now, Nvidia went into great detail in its briefings about why the GT200 is superior to every Nvidia GPU that came before it in terms of GPGPU. However, rather than any significant new designs, a large portion of the improvement is simply attributable to the simple increase in processing units. The end result is an increase in processing power from G80’s 518 GigaFLOPs to GT200’s 933 GigaFLOPs.
However, there are also a few architectural upgrades. First, thread scheduling has been enhanced to enable more effective execution of dual-issue MAD+MUL functions. Additionally, double-precision (64-bit) calculations are now supported. However, because they do not utilize the SPs themselves but rather 30 (one per SM) dedicated double-precision processors, double-precision performance is only one-twelfth that of single-precision. Additionally, four “Atomic” units have been added. Instead of using the chip’s own caches, these are made to handle specific atomic read-modify-write commands with direct memory access.
Overall, though, there is a lot of material that includes really difficult mathematics and their optimizations, and now, not much of it is pertinent to the average customer. We’ll revisit these issues once mainstream GPGPU applications start to appear on store shelves and assess whether they actually make a difference.
After clearing up all the theories, let’s examine the GTX280, the first consumer card built on the GT200.
After taking a few more steps back, the following diagram appeared.
The chip’s shader power is represented by 10 TPCs in this image, which Nvidia refers to as the Streaming Processor Array (SPA). Only eight TPCs made up the SPA in G80 and G92, resulting in a total of 128 SPs. We end up with a total of 240 SPs thanks to the rise in SPs for each TPC and the increase in TPCs per SPA in GT200. I’m sure you’d agree that this is a significant boost. The gain in texturing power described before is also explained by the increase in TPCs. A total of 32 texture address units and 32 texture filtering units are available with two additional TPCs, compared to 24 of each on G92 and 12 addressing units, and 24 filtering units on G80.
The shader-thread dispatch logic, along with the raster/setup engine, is located above the SPA and is responsible for managing the task of dividing up the enormous number of calculations into TPC-sized chunks.
Eight ROP partitions are located underneath the SPA and manage per-pixel operations like alpha-blending and anti-aliasing. For a total of 32 pixels per clock for the entire chip, four pixels can be processed by each ROP division per clock. Additionally, the new ROPs have been modified to enable full-speed blending (i.e. 32 pixels per clock), whereas the G80 could only blend 12 and output 24 pixels per clock cycle. As a result, the GT200 should provide faster performance for antialiasing, particle effects, shadows, and other similar effects. With a total memory interface that is 512 bits wide and a dedicated 64-bit connection to the frame buffer, each ROP has a small amount of L2 cache memory of its own. or, to use another word, enormous!
Improvements to the geometry shading and Z-occlusion culling performance have also been made elsewhere. Additionally, the driver-hardware communication has been improved, potentially eliminating performance-impacting bottlenecks.
All things considered, these modifications produce some astounding raw performance numbers. Over G80, pixel throughput has grown by 33.3%, texturing capability by 25%, and shader processing power by 87.5%. Some of the stats seem less spectacular when compared to the dual-chip cards that ATI and Nvidia both just unveiled, but there are two factors to keep in mind. First, performance from the two chips involved in the dual card solutions is assumed to be perfectly doubled, which is rarely the case in real life. Second, single-chip solutions like the GT200 will give you a guaranteed level of performance, whereas these dual-chip cards only offer performance increases for the games with which they are compatible.
The logical picture is thus presented, but how does it connect to the enormous chip that we previously saw? Check out the details below.
This image of the GT200 shows it with its various computing components highlighted. The unmarked area in the middle serves a number of functions, but its main goal is to manage the rest of the chip, so it also contains elements like the thread scheduler and raster setup.
One final word on DirectX 10.1. Simply put, it’s a shame that GT200 doesn’t support it. Even though 10.1 only modifies a few things and doesn’t add any new features to Microsoft’s gaming API, it sometimes boosts performance and efficiency. The only thing working in Nvidia’s favor in this situation is how few developers have yet to use these new adjustments. This won’t continue to be the case, though. All we can do now is wait and see how this one turns out.
It is abundantly clear that the Nvidia GeForce GTX280 performs incredibly well, and this performance is constant, unlike the recent rash of dual-chip cards we’ve seen. Even though it doesn’t completely destroy everything that came before it, as the 8800 GTX did when it first appeared, there is still a worthwhile upgrade option for those looking to replace their 8800 GTXs. Particularly because more and more games will rely on the additional shader hardware that the GT200 offers.
We also like the card’s physical design, which keeps the tried-and-true black casing and a fantastic cooler from earlier cards while adding LEDs to indicate the right power configuration and a protective covering for the entire card. We really can’t find anything wrong with the GTX280, with the exception of the fact that ATI continues to rule when it comes to how audio pass-through is enabled.
Peak power draw is one area where the GTX280 unavoidably fails, but even then, it is only as high as we would anticipate and is still lower than some cards that have come before it. With support for HybridPower, extremely low idle and video decoding power consumption, and a card that uses these features, your electricity bill shouldn’t be a major concern.
The one aspect that truly worries us is price, as Nvidia has really tightened the screws and is trying to wring every last penny of profit out of the market while it still holds the performance advantage. It’s not a shocking move, and we have no doubt that its rivals would follow suit if given the chance. It’s still tragic, though. Even though the GTX280 is unquestionably the fastest card available, its speed isn’t all that much higher. Without a doubt insufficient to warrant spending twice as much on a 9800 GTX. In fact, if we were to suggest anything at this time, it would be to go out and purchase two 9800 GTX cards and use them in SLI. Either that or you can wait until we review the GTX 260 to see how it compares.
When it comes to performance, the GTX 280 is faultless, and it has every feature that the majority of gamers should care about. Unfortunately, Nvidia has been unreasonable with the pricing, so we are unable to advise purchasing one at the £400–£450 asking price.