UCSB: Extending the Ultimate Yahoo NoSQL Benchmark
A Technical Long-Read on Cleaner, Wider and Faster Environment for RocksDB, LevelDB, WiredTiger and UnumDB Comparisons
Contents
2021 was terrific for DataBase Management Software and startups in general. While classical SQL is shrinking, the data-management market as a whole is booming at 17% CAGR and will reach $150 Billion in 2026, according to Gartner. That and the hype allowed dozens of DBMS startups to raise more capital last year alone than in their entire preceding decade-long history. For 13 companies in our previous extended comparison, it meant swallowing $4.5 Billion of VC money.
With so many players and such high-stakes, there must be have been an evaluation metric - a way to sort out the wheat from the chaff. There are two:
Those cover different workloads. The first is for Key-Value Stores (KVS), and the second is for mostly SQL DBMS systems, built on top KVS. So if you are building a DBMS, it makes sense to use both, one for the persistent data structures and one for the higher-level logic. As expected, we use both and outperform other players in both, but we will skip the TPC for now.
With around 4K โญ on GitHub, YCSB is the popular option. In the past, we have used it extensively, and our previous article covers a lot we can skip this time:
- How ๐ฆ are built on top of open-source RocksDB and WiredTiger? jump
- The liquid-cooled ๐น monster hardware we use for benchmarking: here
- 100 MB, 1 GB, 10 GB and 100 GB results here
As we have previously promised, we are back with expanded datasets and new optimizations, but they are not just inside UnumDB! After careful evaluation, we decided to rewrite the original YCSB package, extending and updating it along the way! Oh, and it’s open-source - check it on GitHub ๐ค If you just want to see the new results - here you go. Spoiler:
Overall, designing new benchmarks isn’t considered a good tone. Especially if you are going to measure your own (hopefully upcoming) product, it makes it too easy to prioritize the operations you are good at and lessen the others. So we preserved the principal part of YCSB - its canonical random key generators and the three most misleading letters of the name ๐
We will talk about many things, including:
- A benchmark for High-Performance Software must be High-Performance Software in itself.
- Tracking hardware resource usage from a separate process, Valgrind style.
- ACID guarantees and multithreading in Key-Value Stores.
- Cost of running a DBMS in a Docker container.
- SLC vs MLC vs TLC relation on DBMS speed.
- 1 TB results for RocksDB, UnumDB and the others.
If it sounds interesting, let’s jump in!
Performance is a Feature
The original YCSB was published over 10 years ago and targeted isolated DBMS applications.
Those run in a separate process, in a different address space and communicate through sockets, often via plain-text commands.
It was simple enough to be understandable and diverse enough to be broadly applicable, so it took off.
People like us have applied it to systems that are much more “low-level” than, let’s say Amazon DynamoDB, Apache Cassandra or ElasticSearch.
In those 10 years, the hardware has changed. Let’s compare AMD CPUs from those two eras:
2012 | 2022 | |
---|---|---|
Top CPU Model | Athlon II X4 651K | EPYC 7773X |
Lithography | 32 nm | 7 nm |
TDP | 100 Watt | 280 Watt |
Core Count | 4 | 64 |
Clock Frequency | 3.0 GHz | 2.2 - 3.5 GHz |
Cache Size | 4 MB | 804 MB |
PCIe | 20x Gen2 | 128x Gen4 |
PCIe Bandwidth | 10 GB/s | 256 GB/s |
RAM | 2x channel DDR3-1866 | 8x channel DDR4-3200 |
RAM Bandwidth | 30 GB/s | 204 GB/s |
In reality, not all of that theoretical bandwidth is always available, but I guess you don’t need cpu-world.com to agree that CPUs changed!
The same applies to SSDs and GPUs. Storage-level technologies are heavily underutilizing the latter. The software must harness all of that speed and parallelism, but it’s only feasible in low-level languages.
Java & Java-like
All performant KVS are implemented in C++ and YCSB is implemented in Java. This means, that you would need some form of a “Foreign Function Interface” to interact with the KVS. This immediately adds unnecessary work for our CPU, but it’s a minor problem compared to rest.
Example 1
Every language and its ecosystem has different priorities. Java focuses on the simplicity of development, while C++ trades it for higher performance.
|
|
The above snippet is from the Apples & SnowFlakes FoundationDB
adapter inside YCSB, but it’s identical across the entire repo.
It’s responsible for generating keys for queries.
Here is what a modern recommended C++ version would look like:
|
|
My entire Java experience is about 1 week long and happened over 10 years ago. So take the next section with a grain of salt.
From Java 7 onwards, the Java String Pool lives in the Heap space, which is garbage collected by the JVM.
This code will produce a StringBuilder
, a heap-allocated array of pointers to heap-allocated strings, later materializing in the final concatenated String
.
Of course, on-heap again.
And if we know something about High-Performance Computing, the heap is expensive, but together with Garbage Collection and multithreading, it becomes completely intolerable.
The same applies to the C++ version.
Yes, we are doing only 1 allocation there, but it is also too slow to be called HPC.
We need to replace std::format
with std::format_to
and export the result into a reusable buffer.
Example 2
If one example is not enough, below is the code snippet, which produces random integers before packing them into String key
.
|
|
To generate a long
, we are doing numerous operations on double
s, by far the most computationally expensive numeric type on modern computers (except for integer division).
Aside from that, this PRNG contains 4x if
statements and synchronized (this)
mutex.
Creating random integers for most distributions is generally within 50 CPU cycles or 10 nanoseconds.
In this implementation, every if
branch may cost that much, and the mutex may cost orders of magnitude more!
It looks like a severe systemic problem to me, so we have searched for C/C++ ports.
Existing C++ Ports
We are not the first to consider porting:
Those implementations aren’t popular. They solve the first issue, of not needing FFI, to call LevelDB, RocksDB or other C++ persistent data structure libs, but aren’t solving the other problems.
|
|
Again, we are generating random numbers under a mutex
, which in turn calls a static std::default_random_engine
here.
Even the most straightforward functions cause expensive on-heap copies and throw exceptions:
|
|
Being a step in the right direction, it causes malloc
-backed heap-allocations in every function interface.
Exceptions, for reference, are banned in half of the companies using C++, including Google.
Being Future-Proof
In 2023 we will be looking at up to 2 Million Random 4KB Read operations per second on the next-gen Intel Optane. With just 24 drives in a 2U chassis, we will be getting 50 MOps/s or 200 GB/s. Far more than your memory system can sustain with copies, let alone your heap allocator. Even the Linux kernel is expected to choke at 10 MOps/s, let alone JVM and most software ever written. We started by using older C++ ports with wrappers for RocksDB, LevelDB and LMDB. Then:
- We added a WiredTiger backend, which is the foundation of MongoDB.
- We normalized and extended the configuration files.
- We removed a few more of those Java-ish inefficiencies.
It was easier to throw away all except a couple of classes and rebuild the rest using the Google Benchmark suite by that time.
In every complicated situation start from scratch!
Ancient C++ Wisdom
New Workloads
Before getting into the tedious intricacies, let’s spice things up a little. YCSB had 6 mixed workloads, from A to F, plus initialization. Those mostly do Read+Update, Read+Insert, Read-Only and Write-Only operations.
Good, but not enough. Today State-of-the-Art Language Models are trained on CommonCrawl samples. That dataset contains 300 TB worth of HTML. It’s just one of many datasets used to solve one of many AI problems.
To work with such volumes, we wanted more “verbs” than just “set” and “get”, but we had to trim some fat to keep it brief. Instead of 7 initial workloads, we have 8:
- โ : imports monotonically increasing keys ๐
- A: 50% reads + 50% updates, all random
- C: reads, all random
- D: 95% reads + 5% inserts, all random
- E: range scan ๐
- โ: batch read ๐
- Y: batch insert ๐
- Z: scans ๐
The โ was previously implemented as one-by-one inserts, but some KVS support the external construction of its internal representation files. The E was previously mixed with 5% insertions.
How We Configured DBs?
We concentrated on benchmarking the following state-of-the-art Key-Value Stores:
- WiredTiger. Version 10.0.0.
- LevelDB. Version 1.23.
- RocksDB. Version 6.29.3 (fork of LevelDB).
- UnumDB. Version 0.2.
- LMDB is also supported, but it was too slow to include into chart. Version 0.9.29.
UnumDB is currently in a pre-release form, but we use it internally in a broad set of configurations on terbayte scale collections.
Memory Limits
Every KVS supports setting a RAM limit, which we choose to be 10% or, more commonly, 5% of the overall database size. It’s a typical server setting, as you generally have at least 10x less RAM than disk space. Desktop setups are even less balanced than that. Many users have just 16 GB of RAM and a 1 TB SSD, meaning a 60x gap.
Compression
None of the DBs relies on custom compression. They all reuse similar sets of open-source compression libs, like Snappy and Zlib. We want to benchmark the DBs and not the compression, so we disabled it across all deployments.
Disk Representation
RocksDB famously has multiple file formats.
By default, it uses the BlockBasedTable
for SSTs, but also provides the PlainTable
and BlobDB
with separate files for keys and values.
The latter ones were unstable and feature-incomplete, so we took the default.
The configuration files for RocksDB, for example, contain over a hundred lines.
It includes settings for the Write-Ahead-Log, flushing guarantees, Skip-List Capacity, file sizes, LSM street growth factors, Bloom-filter specs and more.
RocksDB has the biggest codebase and is probably the hardest to read, understand and maintain.
The cloc
utility measured it’s codebase to be โ 650'000 lines of code.
Removing blanks and noise here is what we get:
LevelDB | RocksDB | WiredTiger | UnumDB | |
---|---|---|---|---|
Code & Tests | โ 20 K | โ 375 K | โ 130 K | โ 19 K |
Comments | โ 4 K | โ 107 K | โ 95 K | โ 3 K |
Wrappers & Other | โ 3 K | โ 90 K | โ 85 K | โ 1 K |
Even though UnumDB is smaller than LevelDB, it’s fair to say that they have the most readable codebase. This, however, comes at the cost of performance.
RocksDB truly stands out in its complexity. So we tried to stick to default configs with minimal changes.
Supported Verbs
KVS variants differ in supported operations. Many are not available natively, so they were simulated using the fastest available functionality.
WiredTiger | LevelDB | RocksDB | UnumDB | |
---|---|---|---|---|
Insert | โ | โ | โ | โ |
Select | โ | โ | โ | โ |
Remove | โ | โ | โ | โ |
Scan | โ | โ | โ | โ |
Initialize | โ | โ | โ | โ |
Batch Select | โ | โ | โ | โ |
Batch Insert | โ | โ | โ | โ |
There is also asymmetry elsewhere:
- WiredTiger supports fixed size integer keys.
- LevelDB only supports variable length keys and values.
- RocksDB has minimal support for
fixed_key_len
, incompatiable withBlockBasedTable
. - UnumDB supports both fixed size keys and values.
Just like YCSB, we use 8-byte integer keys and 1000-byte values. Both WiredTiger and UnumDB were configured to natively use integer keys. RocksDB wrapper reverts the order of bytes in keys to use the native comparator. None of the DBs was set to use fixed size values, as only UnumDB supports that.
Caveats We Faced
If you use Google Benchmark, you know about its bunch of nifty tricks, like DoNotOptimize
or the automatic resolution of the number of iterations at runtime.
It’s widespread in micro-benchmarking, but it begs for extensions when you start profiling a DBMS.
The ones shipped with UCSB spawn a sibling process that samples usage statistics from the OS.
Like valgrind
, we read from /proc/*
files and aggregate stats like SSD I/O and overall RAM usage.
Durability vs Speed
Unlike humans, ACID is one of the best things that can happen to DBMS ๐
Like all good things, ACID is unreachable, because of at least one property - Durability. Absolute Durability is practically impossible and high Durability is expensive.
All high-performance DBs are designed as Log Structured Merge Trees.
It’s a design that essentially bans in-place file overwrites.
Instead, it builds layers of immutable files arranged in a Tree-like order.
The problem is that until you have enough content to populate an entire top-level file, you keep data in RAM - in structures often called MemTable
s.
If the lights go off, volatile memory will be discarded. So a copy of every incoming write is generally appended to a Write-Ahead-Log (WAL). Two problems here:
- You can’t have a full write confirmation before appending to WAL. It’s still a write to disk. A system call. A context switch to kernel space. Want to avoid it with
io_uring
orSPDK
, then be ready to change all the above logic to work in an async manner, but fast enough not to create a new bottleneck. Hint:std::async
will not cut it. - WAL is functionally stepping on the toes of a higher-level logic. Every wrapping DBMS, generally implements such mechanisms, so they disable WAL in KVS, to avoid extra stalls and replication. Example: Yugabyte is a port of Postgres to RocksDB and disables the embedded WAL.
We generally disable WAL and benchmark the core. Still, you can tweak all of that in the UCSB configuration files yourself.
Furthermore, as widely discussed, flushing the data still may not guarantee it’s preservation on your SSD.
So pick you poison hardware wisely and tune your benchmarks cautiously.
Strict vs Flexible RAM Limits
When users specify a RAM limit for a KVS, they expect all of the required in-memory state to fit into that many bytes. It would be too obvious for modern software, so here is one more problem.
Fast I/O is hard. The faster you want it, the more abstractions you will need to replace.
Generally, OS keeps copies of the requested pages in RAM cache.
To avoid it, enable O_DIRECT
.
It will slow down the app and would require some more engineering.
For one, all the disk I/O will have to be aligned to page sizes, generally 4KB, which includes both the address in the file and the address in the userspace buffers.
Split-loads should also be managed with an extra code on your side.
So most KVS (except for UnumDB, of course ๐) solutions don’t bother implementing very fast I/O, like SPDK
.
In that case, they can’t even know how much RAM the underlying OS has reserved for them.
So we have to configure them carefully and, ideally, add external constraints:
|
|
Now a question.
Let’s say you want to mmap
files and be done.
Anyways, Linux can do a far better job at managing caches than most DBs.
In that case - the memory usage will always be very high but within the limits of that process.
As soon as we near the limit - the OS will drop the old caches.
Is it better to use the least RAM or the most RAM until the limit?
For our cloud-first offering, we will favour the second option. It will give the users the most value for their money on single-purpose instances.
Furthermore, we allow and enable “Workload Isolation” in UCSB by default. It will create a separate process and a separate address space for each workload of each DB. Between this, we flush the whole system. The caches filled during insertions benchmarks, will be invalidated before the reads begin. This will make the numbers more reliable but limits concurrent benchmarks to one.
In one of the next articles we will write about the in-hardware Memory Management Unit and the Linux
mmap
implementation, so subscribe ๐ค
Dataset Size & NAND Modes
Large capacity SSDs store multiple bits per cell. If you are buying a Quad Level Cell SSD, you expect each of them to store 4 bits of relevant information. That may be a false expectation.
The SSD can switch to SLC mode during intensive writes, where IO is faster, especially if a lot of space is available. In the case of an 8 TB SSD, before we reach 2 TB used space, all NAND arrays can, in theory, be populated with just one relevant bit.
If you are benchmarking the DBMS, not the SSD, ensure that you did all benchmarks within the same mode. In our case for a 1 TB workload on 8 TB drives, it’s either:
- starting with an empty drive,
- starting with an 80% full drive.
Listing the Knobs
As you see, there is a lot to take into account, and everyone may be interested in a different setting. To sum things up, here is a functionality comparison between YCSB and UCSB.
Present in YCSB | Present in UCSB | |
---|---|---|
Size of the dataset | โ | โ |
DB configuration files | โ | โ |
Workload specifications | โ | โ |
Tracking hardware usage | โ | โ |
Workload Isolation | โ | โ |
Concurrency | โ | โ |
Batch Operations | โ | โ |
Bulk Operations | โ | โ |
Support of Transactions | โ | โ |
There is too much control-flow to tune, so instead of 1'000 CLI arguments, we organize them into a run.py
Python script.
Results ๐ฅ
And then, what’s the point of writing a benchmark if you don’t get to run it! One of the comments to our previous post was wondering, why run small workloads on big machines? The answer is - to upscale the experiment within the same environment and analyze its scaling behaviour.
How we set the Knobs this time:
- Transactions: โ
- Concurrent: โ
- Workload Isolation: โ
- Sizes: 10 GB, 100 GB, 1 TB.
Here is how long one iteration of the benchmark takes:
- 10 GB: 42 minutes.
- 100 GB: 5 hours, 54 minutes.
- 1 TB: 2 days, 10 hours, 6 minutes.
Totalling at 2 days, 16 hours, 42 minutes. Benchmark duration by DBMS:
- WiredTiger: 8 hours, 16 minutes.
- LevelDB: 1 day, 15 hours, 19 minutes.
- RocksDB: 12 hours, 8 minutes.
- UnumDB: 4 hours, 59 minutes.
We rerun those benchmarks many times with different settings. Every DBMS received its own 8 TB Samsung SSD, empty. Later we will avoid slower DBs and will focus on bigger setups, that don’t fit on one SSD:
- Transactions: โ
- Concurrent: โ
- Workload Isolation: โ
- Sizes: 10 TB, 50 TB.
The ๐ฅ and ๐ฅ place often change, but the ๐ฅ leader remains constant. With performance difference often being 2x - 5x against the second-best solution in each workload. Sometimes, this speed comes at the cost of using more RAM, but not always. If the gap between ๐ฅ and ๐ฅ is bigger than the gap within any consecutive pair of entries in the leaderboard - we mark the result with ๐ .
0: Bulk Initialization
Initializing the KVS is done via monotonically ascending keys. The original YCSB initialization always happens one key at a time. We went one step further and implemented a bulk insert functionality. When possible, it constructs big DB-compatible files externally and then submits them into KVS. WiredTiger, RocksDB and UnumDB natively support that, but LevelDB doesn’t.
This is vastly faster than inserting data one by one and even in batches!
Think of it as your zero-to-hero time: how fast can you import all your Parquet
files from S3 buckets, before starting working with them.
Brand | CPU usage | RAM Usage | Disk Usage | Speed |
---|---|---|---|---|
WiredTiger โ MongoDB | 1.00 cores | 3.99 GiB | 989.77 GiB | 1.1M ๐ฅ |
LevelDB โ Google | 0.99 cores | 904.12 MiB | 984.19 GiB | 30.2K |
RocksDB โ Facebook | 0.89 cores | 2.65 GiB | 976.27 GiB | 385.3K ๐ฅ |
UnumDB โ Unum | 1.00 cores | 3.90 GiB | 968.58 GiB | 2.2M ๐ฅ๐ |
A: 50% Random Reads + 50% Random Updates
Every operation is a random single-element operation. Half of them are reads, and half are updates for existing keys.
Brand | CPU usage | RAM Usage | Disk Usage | Speed |
---|---|---|---|---|
WiredTiger โ MongoDB | 3.19 cores | 23.67 GiB | 1.02 TiB | 101.1K ๐ฅ |
LevelDB โ Google | 1.78 cores | 2.54 GiB | 989.39 GiB | 115.2K ๐ฅ |
RocksDB โ Facebook | 1.50 cores | 4.95 GiB | 978.55 GiB | 86.8K |
UnumDB โ Unum | 1.00 cores | 10.01 GiB | 976.32 GiB | 197.3K ๐ฅ๐ |
C: 100% Random Reads
Again, not a particularly interesting benchmark, but a common case when dealing with poorly optimized software. Use the batched approach whenever possible.
Brand | CPU usage | RAM Usage | Disk Usage | Speed |
---|---|---|---|---|
WiredTiger โ MongoDB | 1.39 cores | 17.43 GiB | 989.77 GiB | 146.6K ๐ฅ |
LevelDB โ Google | 1.12 cores | 105.41 MiB | 984.40 GiB | 30.6K |
RocksDB โ Facebook | 0.97 cores | 4.72 GiB | 976.27 GiB | 160.7K ๐ฅ |
UnumDB โ Unum | 0.98 cores | 1.42 GiB | 972.30 GiB | 175.7K ๐ฅ |
D: 95% Random Reads + 5% Random Inserts
Unlike A this benchmark inserts new key-value pairs instead of updating the previously existing ones.
Brand | CPU usage | RAM Usage | Disk Usage | Speed |
---|---|---|---|---|
WiredTiger โ MongoDB | 2.03 cores | 18.48 GiB | 1.02 TiB | 182.4K ๐ฅ |
LevelDB โ Google | 1.85 cores | 232.08 MiB | 1.01 TiB | 20.1K |
RocksDB โ Facebook | 0.99 cores | 5.04 GiB | 1.00 TiB | 173.3K ๐ฅ |
UnumDB โ Unum | 1.01 cores | 17.71 GiB | 1022.78 GiB | 189.9K ๐ฅ |
E: Range Select
Here we randomly select a key and then retrieve the following 100 values. One can easily change the scan length through settings and even define it through provided probability distributions.
Brand | CPU usage | RAM Usage | Disk Usage | Speed |
---|---|---|---|---|
WiredTiger โ MongoDB | 0.38 cores | 4.54 GiB | 989.77 GiB | 250.2K ๐ฅ |
LevelDB โ Google | 0.32 cores | 50.79 MiB | 983.99 GiB | 236.3K ๐ฅ |
RocksDB โ Facebook | 0.34 cores | 4.72 GiB | 976.27 GiB | 177.3K |
UnumDB โ Unum | 0.52 cores | 344.49 MiB | 972.30 GiB | 384.3K ๐ฅ๐ |
X: Batch Reads
It is a benchmark of “batch selections”, where instead of submitting one read operation at a time and waiting for it synchronously, you request a batch of, let’s say, 256 random keys. It enables the DBMS to execute them in a batch-asynchronous fashion, reordering separate operations internally for speed.
This workload is significant for BI and analytical workloads, network/graph analysis being a perfect example. At every step of your dataset exploration, you are fetching data from very different parts of your dataset to provide insights, and this requires fast “batch selections”!
Brand | CPU usage | RAM Usage | Disk Usage | Speed |
---|---|---|---|---|
WiredTiger โ MongoDB | 0.50 cores | 14.29 GiB | 989.77 GiB | 40.9K ๐ฅ |
LevelDB โ Google | 0.64 cores | 55.87 MiB | 983.99 GiB | 16.5K |
RocksDB โ Facebook | 0.35 cores | 9.07 GiB | 976.27 GiB | 51.8K ๐ฅ |
UnumDB โ Unum | 0.37 cores | 830.48 MiB | 972.30 GiB | 303.8K ๐ฅ๐ |
Moreover, today AI researchers mostly train their neural networks after uniformly shuffling the datasets. It is a simple approach, but is it the best strategy if we can rapidly sample batches?
Y: Batch Insert
Instead of inserting one value at a time, like in C, we submit batches of upto 10'000 values, depending on the DB size. It’s natively supported only by LevelDB and UnumDB.
This is different from a transaction with 256 new entries in it, as if one of the operations fails, we want the remaining to proceed. Transactions add the rollback functionality and will be the subject of future publications.
Brand | CPU usage | RAM Usage | Disk Usage | Speed |
---|---|---|---|---|
WiredTiger โ MongoDB | 1.00 cores | 227.22 MiB | 1.02 TiB | 450.3K ๐ฅ |
LevelDB โ Google | 0.99 cores | 602.71 MiB | 1.11 TiB | 41.1K |
RocksDB โ Facebook | 1.71 cores | 5.63 GiB | 1.10 TiB | 690.2K ๐ฅ |
UnumDB โ Unum | 1.00 cores | 2.38 GiB | 1.04 TiB | 813.2K ๐ฅ |
Z: Scan
It streams all the data present in the store, which may be later sampled and forwarded into some neural-network training procedure.
Brand | CPU usage | RAM Usage | Disk Usage | Speed |
---|---|---|---|---|
WiredTiger โ MongoDB | 1.22 cores | 16.80 GiB | 989.77 GiB | 1.4M ๐ฅ |
LevelDB โ Google | 0.59 cores | 7.36 GiB | 983.99 GiB | 1.2M ๐ฅ |
RocksDB โ Facebook | 0.14 cores | 4.72 GiB | 976.27 GiB | 84.5K |
UnumDB โ Unum | 0.34 cores | 16.89 GiB | 968.58 GiB | 1.7M ๐ฅ |
When comparing LevelDB and RocksDB, we see their rivalry. One is a lot better for sequential scan-like workloads. In contrast, the other one is better at randomized operations. The most astonishing result for UnumDB - it is better at both! WiredTiger often also performs well, but almost always uses the most energy and memory to get there.
Brand | 0 | A | C | D | E | X | Y | Z |
---|---|---|---|---|---|---|---|---|
WiredTiger | ๐ฅ | ๐ฅ | ๐ฅ | ๐ฅ | ๐ฅ | ๐ฅ | ๐ฅ | ๐ฅ |
LevelDB | ๐ฅ | ๐ฅ | ๐ฅ | |||||
RocksDB | ๐ฅ | ๐ฅ | ๐ฅ | ๐ฅ | ๐ฅ | |||
UnumDB | ๐ฅ๐ | ๐ฅ๐ | ๐ฅ | ๐ฅ | ๐ฅ๐ | ๐ฅ๐ | ๐ฅ | ๐ฅ |
This may not sound relevant, but it is. Most companies operate 2 data stores separately - one for small real-time transactions and another for big analytical queries. This means more engineers, complex synchronization and outdated analytics. We hope to replace that entire mess with a single solution.
Is this the limit? โก
Far from it! We run those benchmarks with our hands essentially cuffed. Here is a broader picture:
- when put in a Docker container, most KVS slow down by 15%, UnumDB - by around 7%.
- on ARM, the gap between UnumDB and the second-best solution is generally more prominent than on x86 due to our adoption of the Neon instruction family.
- multithreaded and transactional operations are again a lot faster, as we invest heavily into concurrent in-memory data structures far more performant and memory-efficient than a concurrent Skip-List can be.
- initialization order has unprecedented affect on the speed of measured operations - when the keys aren’t mononically rising other DBs can often be 2x-5x slower.
- with GPUs enabled, the gap becomes astronomical, but it’s not a fair comparison, as GPUs would draw more power. 400W per GPU vs 300W when fully loading a 64-core CPU.
So prepare to see broader benchmarks soon! Anyways, our KVS performance seems solid for now, and we are focusing on:
- horizontal scaling and cloud deployment,
- going from string values to a higher-level type system,
- supporting arbitrary JSON-like inputs,
- an SQL-compatibility layer,
- bindings for dynamic scripting languages.
It might seem too ambitious for a side-project of an AI-oriented team, but we are not stopping any time soon! It’s not just about examining more data faster. It’s also about studying the same data efficiently. In 2020 alone:
- data transmission networks consumed 260-340 TWh.
- data centre electricity use was 200-250 TWh.
- cryptocurrency mining consumed ~100 TWh.
Totalling at 600-700 TWh, or what the entire 1.26 Billion population of Africa consumed in that same year. Or over 100x the energy consumption of our home country.
With that in mind, we invite everyone to think twice about the tools we use. At least if we are genuinely committed to making the world a better place through software. It takes only a couple of search queries to realize that CuPy can be 1'000x faster than NumPy. Similarly, other startups modernize the streaming layer, like RedPanda replacing Apache Kafka. We have already announced our non-DBMS research directions for 2022, but all the solutions we are releasing this year target the storage layer. Any computer takes at least 100 microseconds to fetch anything from a fast SSD, meaning that accessing data can be over 200'000x more expensive than processing it. It’s the slowest piece of the modern computer, so our UnumDB should impact millions of applications widening their most critical bottleneck!
So far, I have been personally spending millions on our research since 2015. Offices, high-end servers, R&D teams, you name it ๐ Now, we are expanding further, attracting brilliant, inspiring researchers and organizing more technical conferences than ever. What previously felt like an impossible dream of making ๐ฆ๐ฒ the second ๐ฎ๐ฑ, almost feels like a passed step! If you want to do research with us, join C++ conferences as a speaker or invest, just let us know! Anyways, who would need Oracle or MongoDB once there is UnumDB? ๐