CPU vs GPU Benchmarks Series
AMD Threadripper PRO 3995WX vs Nvidia RTX 3090: What is better for number crunching?
Contents
We haven’t published any articles this year yet, as we wanted to start with something big. So here we begin a series of posts and live meetups on General-Purpose Computing on Graphics Processing Units. Or, GPGPU, for short.
Contents
The intention was to benchmark the most commonly used libraries for datascience and suggest most relevant tools for 2022. Here is what we came up with:
- NumPy vs CuPy, here. Deeper investigation in the C++/CUDA land:
- Pandas vs Modin vs Spark vs Arrow vs cuDF, here.
- NetworkX vs RetworkX vs cuGraph, coming soon.
- TensorFlow vs PyTorch vs JAX, coming soon.
Hardware
Enough has been said about CPU vs GPU hardware differences, so we won’t cover the basics. Let’s check the specifics. We took the most potent pieces a person could buy in a retail store and put them head to head.
Our CPU of choice in 2022 is AMD Threadripper PRO 3995WX:
- priced at around $7'000 on Amazon.
- equipped with 64 cores, 128 threads, and 256 MB of L3 cache.
- TDP: 280 Watts.
Our GPU is Nvidia RTX 3090, Founders Edition:
- with a recommended price of $1'500, these GPUs are now 3x the price on the market.
- equipped with 10'496 CUDA cores at 1.4 GHz base clock and 24 GB of GDDR6X memory.
- TDP: 350 Watts.
A complete review of the host machine was given in the Yahoo Cloud Serving Benchmark.
A non-PRO version of the CPU would cost a couple thousand less but reaches the same numbers in the following benchmarks. All in all, those devices are comparable in price, power consumption, and novelty. Aside from the basic GPU specs, we must remind you that RTX 30 cards come with 2nd Gen Ray Tracing Cores and 3rd Gen Tensor Cores. The first ones are irrelevant for this story, and the Tensor Cores come with an asterisks.
One of the primary features of newer tech is support for rarely seen i8
and i4
matrices.
Those are increasingly popular in AI inference, especially if you don’t adopt them yourself, but try the Triton Server.
Those may also become popular in AI training, but none of the big frameworks (PT, TF, JAX) support sub-byte matrices yet.
Let’s start!