Why Build a Custom DBMS?
Contents
Why would an AI-Research team spend years designing new storage infrastructure? Well, let’s look at hardware.
A high-end server like DGX A100 today costs around $300'000 and comes with following components:
- 2x CPUs (with shared RAM),
- 8x GPUs (each with private VRAM) and
- 2x SSDs.
Almost the entire cost of hardware is the cost of logic and volatile memory - RAM. RAM is fast, but also small. The chances are, your computer wastes most of the time simply fetching and sending data instead of actually “computing”.
Channel | Bandwidth | Memory Volume / Socket |
---|---|---|
CPU ⇌ DDR4 RAM | ~100 GB/s | 1'000 GB |
GPU ⇌ HBM2 VRAM | ~1'000 GB/s | 300 GB |
GPU ⇌ GPU via NVLink | ~300 GB/s | - |
CPU ⇌ GPU via PCI-E Gen3 x16 | ~15 GB/s | - |
CPU ⇌ SSD via NVMe PCI-E Gen3 x4 | « 4 GB/s | 1'000'000 GB |
For us, Big Data begins where RAM capacities end. We often store 10-1'000 TB of data on a single machine and need to navigate and analyze it in real-time. The (CPU ⇌ SSD) link is the slowest in the chain. It’s theoretical throughput (for Gen3 PCI-E) is 4 GB/s, but the real world reads on samll files often degrade to 200 MB/s. The reallity is even more depressing, if you store your data in stuctured DBMS. Those mix frequent random read & write operations which slows down SSDs to 10 MB/s.
Again, on the GPU side you can fetch 1 TB/s, but if your SSD provides you only 10 MB/s, most of your chips will be wasting compute cycles…
How much faster UnumDB is?
We generally benchmark UnumDB against following competitors:
- SQLite is the most minimalistic SQL database.
- MySQL is the most widely used Open-Source DBMS in the world.
- Postgres is the 2nd most popular Open-Source DBMS. Implemented in 1'300'000 lines of C++ code.
- MongoDB is the most popular NoSQL database. Implemented in 3'900'000 lines of C++ code.
- Neo4J is the most popular graph database. Implemented in 800'000 lines of Java code.
- ElasticSearch is the most popular indexing software. Implemented in 300'000 lines of Java code on top the Lucene C engine.
Abandoning legacy approaches allowed us to reach:
- 50x faster dataset imports.
- 5x smaller compressed representations.
- 100x faster random lookups.
- 10x faster batched insertions.
- compact implementation in 100'000 lines of C++ code.