Why would an AI-Research team spend years designing new storage infrastructure? Well, let’s look at hardware.

A high-end server like DGX A100 today costs around $300'000 and comes with following components:

  • 2x CPUs (with shared RAM),
  • 8x GPUs (each with private VRAM) and
  • 2x SSDs.

Almost the entire cost of hardware is the cost of logic and volatile memory - RAM. RAM is fast, but also small. The chances are, your computer wastes most of the time simply fetching and sending data instead of actually “computing”.

Channel Bandwidth Memory Volume / Socket
CPU ⇌ DDR4 RAM ~100 GB/s 1'000 GB
GPU ⇌ HBM2 VRAM ~1'000 GB/s 300 GB
GPU ⇌ GPU via NVLink ~300 GB/s -
CPU ⇌ GPU via PCI-E Gen3 x16 ~15 GB/s -
CPU ⇌ SSD via NVMe PCI-E Gen3 x4 « 4 GB/s 1'000'000 GB

For us, Big Data begins where RAM capacities end. We often store 10-1'000 TB of data on a single machine and need to navigate and analyze it in real-time. The (CPU ⇌ SSD) link is the slowest in the chain. It’s theoretical throughput (for Gen3 PCI-E) is 4 GB/s, but the real world reads on samll files often degrade to 200 MB/s. The reallity is even more depressing, if you store your data in stuctured DBMS. Those mix frequent random read & write operations which slows down SSDs to 10 MB/s.

Again, on the GPU side you can fetch 1 TB/s, but if your SSD provides you only 10 MB/s, most of your chips will be wasting compute cycles…

How much faster UnumDB is?

We generally benchmark UnumDB against following competitors:

  1. SQLite is the most minimalistic SQL database.
  2. MySQL is the most widely used Open-Source DBMS in the world.
  3. Postgres is the 2nd most popular Open-Source DBMS. Implemented in 1'300'000 lines of C++ code.
  4. MongoDB is the most popular NoSQL database. Implemented in 3'900'000 lines of C++ code.
  5. Neo4J is the most popular graph database. Implemented in 800'000 lines of Java code.
  6. ElasticSearch is the most popular indexing software. Implemented in 300'000 lines of Java code on top the Lucene C engine.

Abandoning legacy approaches allowed us to reach:

  • 50x faster dataset imports.
  • 5x smaller compressed representations.
  • 100x faster random lookups.
  • 10x faster batched insertions.
  • compact implementation in 100'000 lines of C++ code.