Why would an AI-Research team spend years designing new storage infrastructure? Well, let’s look at hardware.
A high-end server like DGX A100 today costs around $300'000 and comes with following components:
- 2x CPUs (with shared RAM),
- 8x GPUs (each with private VRAM) and
- 2x SSDs.
Almost the entire cost of hardware is the cost of logic and volatile memory - RAM. RAM is fast, but also small. The chances are, your computer wastes most of the time simply fetching and sending data instead of actually “computing”.
|Channel||Bandwidth||Memory Volume / Socket|
|CPU ⇌ DDR4 RAM||~100 GB/s||1'000 GB|
|GPU ⇌ HBM2 VRAM||~1'000 GB/s||300 GB|
|GPU ⇌ GPU via NVLink||~300 GB/s||-|
|CPU ⇌ GPU via PCI-E Gen3 x16||~15 GB/s||-|
|CPU ⇌ SSD via NVMe PCI-E Gen3 x4||« 4 GB/s||1'000'000 GB|
For us, Big Data begins where RAM capacities end. We often store 10-1'000 TB of data on a single machine and need to navigate and analyze it in real-time. The (CPU ⇌ SSD) link is the slowest in the chain. It’s theoretical throughput (for Gen3 PCI-E) is 4 GB/s, but the real world reads on samll files often degrade to 200 MB/s. The reallity is even more depressing, if you store your data in stuctured DBMS. Those mix frequent random read & write operations which slows down SSDs to 10 MB/s.
Again, on the GPU side you can fetch 1 TB/s, but if your SSD provides you only 10 MB/s, most of your chips will be wasting compute cycles…
How much faster UnumDB is?
We generally benchmark UnumDB against following competitors:
- SQLite is the most minimalistic SQL database.
- MySQL is the most widely used Open-Source DBMS in the world.
- Postgres is the 2nd most popular Open-Source DBMS. Implemented in 1'300'000 lines of C++ code.
- MongoDB is the most popular NoSQL database. Implemented in 3'900'000 lines of C++ code.
- Neo4J is the most popular graph database. Implemented in 800'000 lines of Java code.
- ElasticSearch is the most popular indexing software. Implemented in 300'000 lines of Java code on top the Lucene C engine.
Abandoning legacy approaches allowed us to reach:
- 50x faster dataset imports.
- 5x smaller compressed representations.
- 100x faster random lookups.
- 10x faster batched insertions.
- compact implementation in 100'000 lines of C++ code.