We haven’t published any articles this year yet, as we wanted to start with something big. So here we begin a series of posts and live meetups on General-Purpose Computing on Graphics Processing Units. Or, GPGPU, for short.

Contents

The intention was to benchmark the most commonly used libraries for datascience and suggest most relevant tools for 2022. Here is what we came up with:

  1. NumPy vs CuPy, here. Deeper investigation in the C++/CUDA land:
    1. 879 GB/s Reductions in C++ and CUDA, here.
    2. Failing to Reach DDR4 Bandwidth, here.
  2. NetworkX vs RetworkX vs cuGraph, coming soon.
  3. Pandas vs Modin vs cuDF, coming soon.
  4. TensorFlow vs PyTorch vs JAX, coming soon.

Hardware

Enough has been said about CPU vs GPU hardware differences, so we won’t cover the basics. Let’s check the specifics. We took the most potent pieces a person could buy in a retail store and put them head to head.

AMD Threadripper PRO 3995WX

Our CPU of choice in 2022 is AMD Threadripper PRO 3995WX:

  • priced at around $7'000 on Amazon.
  • equipped with 64 cores, 128 threads, and 256 MB of L3 cache.
  • TDP: 280 Watts.

Nvidia RTX 3090

Our GPU is Nvidia RTX 3090, Founders Edition:

  • with a recommended price of $1'500, these GPUs are now 3x the price on the market.
  • equipped with 10'496 CUDA cores at 1.4 GHz base clock and 24 GB of GDDR6X memory.
  • TDP: 350 Watts.

A complete review of the host machine was given in the Yahoo Cloud Serving Benchmark.

A non-PRO version of the CPU would cost a couple thousand less but reaches the same numbers in the following benchmarks. All in all, those devices are comparable in price, power consumption, and novelty. Aside from the basic GPU specs, we must remind you that RTX 30 cards come with 2nd Gen Ray Tracing Cores and 3rd Gen Tensor Cores. The first ones are irrelevant for this story, and the Tensor Cores come with an asterisks.

Ampere Tensor Cores Capability

One of the primary features of newer tech is support for rarely seen i8 and i4 matrices. Those are increasingly popular in AI inference, especially if you don’t adopt them yourself, but try the Triton Server. Those may also become popular in AI training, but none of the big frameworks (PT, TF, JAX) support sub-byte matrices yet.


Let’s start!