For the past 12 months, Nvidia has slow-walked its GP100 GPU — that’s Pascal, but equipped with HBM2 and a full suite of technological features meant to appeal to the HPC and scientific communities — into its high-end markets. Initially, the card was only available with a passive cooler and was intended solely for use in high-end servers. The first Quadro card to use a Pascal GPU were the P5000 and P6000, but these were derived from GP102 and combined a Pascal-class graphics processor with GDDR5X, the memory technology Nvidia used to good effect with its GeForce 10xx GPU family. Now Nvidia has announced the Quadro GP100 — an ultra-high end GPU based on the full Pascal implementation, with HBM2 attached as well.
Traditionally, Nvidia’s highest-end cards slot into the Tesla and Quadro product lines, which don’t always follow the consumer cycle’s refresh patterns. This was particularly true with Maxwell, which debuted in the Quadro family, but didn’t replace all of Nvidia’s Kepler-derived parts in the Tesla division. There were some questions about whether Nvidia would bring the GP100 GPU to Quadro at all, since the P5000 and P6000 shipped months ago and offered more than enough horsepower. In fact, the Quadro GP100 offers less FLOPS performance than the GP102-derived P6000. GP100 is a 3584:224:128 card (core count:texture units:render outputs) as compared with GP102 at 3840:240:96.
Where GP100 distinguishes itself is in three areas. First, it does offer 128 render outputs as opposed to 96 on GP102, though it’s by no means clear that any high-end workstation workloads will benefit from this. Second, it offers an estimated 720GB/s of memory bandwidth, up substantially from the 432GB/s on GP102. Third, it has custom brackets that support NVLink connections, meaning you can hypothetically hook two Quadro GP100 GPUs together in the same machine, with up to 80GB/s (40GB/s per link, with two link brackets).
But pushing GP100 out to the Quadro family sets up another potential collision between GP100 and GP102 — total memory capacity. GP100 tops out at 16GB, as all first-generation HBM2 cards are expected to do, while GP102 offers up to 24GB of GDDR5X. One reason why the workstation and HPC markets have an appetite for additional RAM is because many applications must be able to load their entire working data sets into memory before they can render them. The degree to which this is true depends to at least some degree on the application and which version(s) of CUDA it supports, but it’s been a persistent limitation when we’ve previously done workstation testing or spoken to various vendors. Their differing capacities suggest that some markets will remain better-served by the GP102-derived P6000 with its 24GB of RAM than the 16GB on Quadro GP100.
But then again, there’s opportunity here, too. When reviews surface of the new chip, we’ll get a look at the different strengths of HBM2 and perhaps a clearer picture of where providing huge amounts of memory bandwidth can improve performance or reduce it. While the two cards are not strictly apples-to-apples, they’re likely the closest look we’ll ever get at the “same” chip running on two different memory interfaces. If the Quadro GP100 proves to have 128 ROPS, that will tilt fillrate-heavy tests in favor of the HBM2 chip. But this can be ameliorated by workload choice — applications that don’t push the 96 ROPS on P6000 won’t push the 128 ROPS on GP100, either.
AMD has already stated that its upcoming Vega GPU refresh will use HBM2, as well future high-end FirePro cards based on that architecture. Nvidia has yet to reveal details concerning any refreshed Pascal GPUs that might be arriving in 2017