2021 was terrific for DataBase Management Software and startups in general. While classical SQL is shrinking, the data-management market as a whole is booming at 17% CAGR and will reach $150 Billion in 2026, according to Gartner. That and the hype allowed dozens of DBMS startups to raise more capital last year alone than in their entire preceding decade-long history. For 13 companies in our previous extended comparison, it meant swallowing$4.5 Billion of VC money.

With so many players and such high-stakes, there must be have been an evaluation metric - a way to sort out the wheat from the chaff. There are two:

• YCSB: Yahoo Cloud Serving Benchmark,
• TPC: Transaction processing Performance Council.

Those cover different workloads. The first is for Key-Value Stores (KVS), and the second is for mostly SQL DBMS systems, built on top KVS. So if you are building a DBMS, it makes sense to use both, one for the persistent data structures and one for the higher-level logic. As expected, we use both and outperform other players in both, but we will skip the TPC for now.

With around 4K ⭐ on GitHub, YCSB is the popular option. In the past, we have used it extensively, and our previous article covers a lot we can skip this time:

• How 🦄 are built on top of open-source RocksDB and WiredTiger? jump
• The liquid-cooled 👹 monster hardware we use for benchmarking: here
• 100 MB, 1 GB, 10 GB and 100 GB results here

As we have previously promised, we are back with expanded datasets and new optimizations, but they are not just inside UnumDB! After careful evaluation, we decided to rewrite the original YCSB package, extending and updating it along the way! Oh, and it’s open-source - check it on GitHub 🤗 If you just want to see the new results - here you go. Spoiler:

Overall, designing new benchmarks isn’t considered a good tone. Especially if you are going to measure your own (hopefully upcoming) product, it makes it too easy to prioritize the operations you are good at and lessen the others. So we preserved the principal part of YCSB - its canonical random key generators and the three most misleading letters of the name 😅

We will talk about many things, including:

• A benchmark for High-Performance Software must be High-Performance Software in itself.
• Tracking hardware resource usage from a separate process, Valgrind style.
• ACID guarantees and multithreading in Key-Value Stores.
• Cost of running a DBMS in a Docker container.
• SLC vs MLC vs TLC relation on DBMS speed.
• 1 TB results for RocksDB, UnumDB and the others.

If it sounds interesting, let’s jump in!

## Performance is a Feature

The original YCSB was published over 10 years ago and targeted isolated DBMS applications. Those run in a separate process, in a different address space and communicate through sockets, often via plain-text commands. It was simple enough to be understandable and diverse enough to be broadly applicable, so it took off. People like us have applied it to systems that are much more “low-level” than, let’s say Amazon DynamoDB, Apache Cassandra or ElasticSearch.

In those 10 years, the hardware has changed. Let’s compare AMD CPUs from those two eras:

2012 2022
Top CPU Model Athlon II X4 651K EPYC 7773X
Lithography 32 nm 7 nm
TDP 100 Watt 280 Watt
Core Count 4 64
Clock Frequency 3.0 GHz 2.2 - 3.5 GHz
Cache Size 4 MB 804 MB
PCIe 20x Gen2 128x Gen4
PCIe Bandwidth 10 GB/s 256 GB/s
RAM 2x channel DDR3-1866 8x channel DDR4-3200
RAM Bandwidth 30 GB/s 204 GB/s

In reality, not all of that theoretical bandwidth is always available, but I guess you don’t need cpu-world.com to agree that CPUs changed!

The same applies to SSDs and GPUs. Storage-level technologies are heavily underutilizing the latter. The software must harness all of that speed and parallelism, but it’s only feasible in low-level languages.

### Java & Java-like

All performant KVS are implemented in C++ and YCSB is implemented in Java. This means, that you would need some form of a “Foreign Function Interface” to interact with the KVS. This immediately adds unnecessary work for our CPU, but it’s a minor problem compared to rest.

#### Example 1

Every language and its ecosystem has different priorities. Java focuses on the simplicity of development, while C++ trades it for higher performance.

 1 2 3  private static String getRowKey(String db, String table, String key) { return db + ":" + table + ":" + key; } 

The above snippet is from the Apples & SnowFlakes FoundationDB adapter inside YCSB, but it’s identical across the entire repo. It’s responsible for generating keys for queries. Here is what a modern recommended C++ version would look like:

 1 2 3  auto get_row_key(std::string_view db, std::string_view table, std::string_view key) { return std::format("{}:{}:{}", db, table, key); } 

My entire Java experience is about 1 week long and happened over 10 years ago. So take the next section with a grain of salt.

From Java 7 onwards, the Java String Pool lives in the Heap space, which is garbage collected by the JVM. This code will produce a StringBuilder, a heap-allocated array of pointers to heap-allocated strings, later materializing in the final concatenated String. Of course, on-heap again. And if we know something about High-Performance Computing, the heap is expensive, but together with Garbage Collection and multithreading, it becomes completely intolerable. The same applies to the C++ version. Yes, we are doing only 1 allocation there, but it is also too slow to be called HPC. We need to replace std::format with std::format_to and export the result into a reusable buffer.

#### Example 2

If one example is not enough, below is the code snippet, which produces random integers before packing them into String key.

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23  long nextLong(long itemcount) { // from "Quickly Generating Billion-Record Synthetic Databases", Jim Gray et al, SIGMOD 1994 if (itemcount != countforzeta) { synchronized (this) { if (itemcount > countforzeta) { ... else ... } } double u = ThreadLocalRandom.current().nextDouble(); double uz = u * zetan; if (uz < 1.0) return base; if (uz < 1.0 + Math.pow(0.5, theta)) return base + 1; long ret = base + (long) ((itemcount) * Math.pow(eta * u - eta + 1, alpha)); setLastValue(ret); return ret; } 

To generate a long, we are doing numerous operations on doubles, by far the most computationally expensive numeric type on modern computers (except for integer division). Aside from that, this PRNG contains 4x if statements and synchronized (this) mutex. Creating random integers for most distributions is generally within 50 CPU cycles or 10 nanoseconds. In this implementation, every if branch may cost that much, and the mutex may cost orders of magnitude more!

It looks like a severe systemic problem to me, so we have searched for C/C++ ports.

### Existing C++ Ports

We are not the first to consider porting:

Those implementations aren’t popular. They solve the first issue, of not needing FFI, to call LevelDB, RocksDB or other C++ persistent data structure libs, but aren’t solving the other problems.

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18  inline uint64_t ZipfianGenerator::Next(uint64_t num) { assert(num >= 2 && num < kMaxNumItems); std::lock_guard lock(mutex_); if (num > n_for_zeta_) { RaiseZeta(num); eta_ = Eta(); } double u = utils::RandomDouble(); double uz = u * zeta_n_; if (uz < 1.0) return last_value_ = 0; if (uz < 1.0 + std::pow(0.5, theta_)) return last_value_ = 1; return last_value_ = base_ + num * std::pow(eta_ * u - eta_ + 1, alpha_); } 

Again, we are generating random numbers under a mutex, which in turn calls a static std::default_random_engine here. Even the most straightforward functions cause expensive on-heap copies and throw exceptions:

  1 2 3 4 5 6 7 8 9 10  inline bool StrToBool(std::string str) { std::transform(str.begin(), str.end(), str.begin(), ::tolower); if (str == "true" || str == "1") { return true; } else if (str == "false" || str == "0") { return false; } else { throw Exception("Invalid bool string: " + str); } } 

Being a step in the right direction, it causes malloc-backed heap-allocations in every function interface. Exceptions, for reference, are banned in half of the companies using C++, including Google.

### Being Future-Proof

In 2023 we will be looking at up to 2 Million Random 4KB Read operations per second on the next-gen Intel Optane. With just 24 drives in a 2U chassis, we will be getting 50 MOps/s or 200 GB/s. Far more than your memory system can sustain with copies, let alone your heap allocator. Even the Linux kernel is expected to choke at 10 MOps/s, let alone JVM and most software ever written. We started by using older C++ ports with wrappers for RocksDB, LevelDB and LMDB. Then:

• We added a WiredTiger backend, which is the foundation of MongoDB.
• We normalized and extended the configuration files.
• We removed a few more of those Java-ish inefficiencies.

It was easier to throw away all except a couple of classes and rebuild the rest using the Google Benchmark suite by that time.

In every complicated situation start from scratch!

Ancient C++ Wisdom

Before getting into the tedious intricacies, let’s spice things up a little. YCSB had 6 mixed workloads, from A to F, plus initialization. Those mostly do Read+Update, Read+Insert, Read-Only and Write-Only operations.

Good, but not enough. Today State-of-the-Art Language Models are trained on CommonCrawl samples. That dataset contains 300 TB worth of HTML. It’s just one of many datasets used to solve one of many AI problems.

To work with such volumes, we wanted more “verbs” than just “set” and “get”, but we had to trim some fat to keep it brief. Instead of 7 initial workloads, we have 8:

• : imports monotonically increasing keys 🔄
• D: 95% reads + 5% inserts, all random
• E: range scan 🔄
• Y: batch insert 🆕
• Z: scans 🆕

The was previously implemented as one-by-one inserts, but some KVS support the external construction of its internal representation files. The E was previously mixed with 5% insertions.

## How We Configured DBs?

We concentrated on benchmarking the following state-of-the-art Key-Value Stores:

1. WiredTiger. Version 10.0.0.
2. LevelDB. Version 1.23.
3. RocksDB. Version 6.29.3 (fork of LevelDB).
4. UnumDB. Version 0.2.
5. LMDB is also supported, but it was too slow to include into chart. Version 0.9.29.

UnumDB is currently in a pre-release form, but we use it internally in a broad set of configurations on terbayte scale collections.

### Memory Limits

Every KVS supports setting a RAM limit, which we choose to be 10% or, more commonly, 5% of the overall database size. It’s a typical server setting, as you generally have at least 10x less RAM than disk space. Desktop setups are even less balanced than that. Many users have just 16 GB of RAM and a 1 TB SSD, meaning a 60x gap.

### Compression

None of the DBs relies on custom compression. They all reuse similar sets of open-source compression libs, like Snappy and Zlib. We want to benchmark the DBs and not the compression, so we disabled it across all deployments.

### Disk Representation

RocksDB famously has multiple file formats. By default, it uses the BlockBasedTable for SSTs, but also provides the PlainTable and BlobDB with separate files for keys and values. The latter ones were unstable and feature-incomplete, so we took the default.

The configuration files for RocksDB, for example, contain over a hundred lines. It includes settings for the Write-Ahead-Log, flushing guarantees, Skip-List Capacity, file sizes, LSM street growth factors, Bloom-filter specs and more. RocksDB has the biggest codebase and is probably the hardest to read, understand and maintain. The cloc utility measured it’s codebase to be ≈ 650'000 lines of code. Removing blanks and noise here is what we get:

LevelDB RocksDB WiredTiger UnumDB
Code & Tests ≈ 20 K ≈ 375 K ≈ 130 K ≈ 19 K
Comments ≈ 4 K ≈ 107 K ≈ 95 K ≈ 3 K
Wrappers & Other ≈ 3 K ≈ 90 K ≈ 85 K ≈ 1 K

Even though UnumDB is smaller than LevelDB, it’s fair to say that they have the most readable codebase. This, however, comes at the cost of performance.

RocksDB truly stands out in its complexity. So we tried to stick to default configs with minimal changes.

### Supported Verbs

KVS variants differ in supported operations. Many are not available natively, so they were simulated using the fastest available functionality.

WiredTiger LevelDB RocksDB UnumDB
Insert
Select
Remove
Scan
Initialize
Batch Select
Batch Insert

There is also asymmetry elsewhere:

• WiredTiger supports fixed size integer keys.
• LevelDB only supports variable length keys and values.
• RocksDB has minimal support for fixed_key_len, incompatiable with BlockBasedTable.
• UnumDB supports both fixed size keys and values.

Just like YCSB, we use 8-byte integer keys and 1000-byte values. Both WiredTiger and UnumDB were configured to natively use integer keys. RocksDB wrapper reverts the order of bytes in keys to use the native comparator. None of the DBs was set to use fixed size values, as only UnumDB supports that.

## Caveats We Faced

If you use Google Benchmark, you know about its bunch of nifty tricks, like DoNotOptimize or the automatic resolution of the number of iterations at runtime. It’s widespread in micro-benchmarking, but it begs for extensions when you start profiling a DBMS. The ones shipped with UCSB spawn a sibling process that samples usage statistics from the OS. Like valgrind, we read from /proc/* files and aggregate stats like SSD I/O and overall RAM usage.

### Durability vs Speed

#### Unlike humans, ACID is one of the best things that can happen to DBMS 😁

Like all good things, ACID is unreachable, because of at least one property - Durability. Absolute Durability is practically impossible and high Durability is expensive.

All high-performance DBs are designed as Log Structured Merge Trees. It’s a design that essentially bans in-place file overwrites. Instead, it builds layers of immutable files arranged in a Tree-like order. The problem is that until you have enough content to populate an entire top-level file, you keep data in RAM - in structures often called MemTables.

If the lights go off, volatile memory will be discarded. So a copy of every incoming write is generally appended to a Write-Ahead-Log (WAL). Two problems here:

1. You can’t have a full write confirmation before appending to WAL. It’s still a write to disk. A system call. A context switch to kernel space. Want to avoid it with io_uring or SPDK, then be ready to change all the above logic to work in an async manner, but fast enough not to create a new bottleneck. Hint: std::async will not cut it.
2. WAL is functionally stepping on the toes of a higher-level logic. Every wrapping DBMS, generally implements such mechanisms, so they disable WAL in KVS, to avoid extra stalls and replication. Example: Yugabyte is a port of Postgres to RocksDB and disables the embedded WAL.

We generally disable WAL and benchmark the core. Still, you can tweak all of that in the UCSB configuration files yourself.

Furthermore, as widely discussed, flushing the data still may not guarantee it’s preservation on your SSD. So pick you poison hardware wisely and tune your benchmarks cautiously.

### Strict vs Flexible RAM Limits

When users specify a RAM limit for a KVS, they expect all of the required in-memory state to fit into that many bytes. It would be too obvious for modern software, so here is one more problem.

Fast I/O is hard. The faster you want it, the more abstractions you will need to replace.

graph LR Application -->|libc| LIBC[Userspace Buffers] Application -->|mmap| PC[Page Cache] Application -->|mmap+O_DIRECT| BL[Block I/O Layer] Application -->|SPDK| DL[Device Layer] LIBC --> PC PC --> BL BL --> DL

Generally, OS keeps copies of the requested pages in RAM cache. To avoid it, enable O_DIRECT. It will slow down the app and would require some more engineering. For one, all the disk I/O will have to be aligned to page sizes, generally 4KB, which includes both the address in the file and the address in the userspace buffers. Split-loads should also be managed with an extra code on your side. So most KVS (except for UnumDB, of course 😂) solutions don’t bother implementing very fast I/O, like SPDK. In that case, they can’t even know how much RAM the underlying OS has reserved for them. So we have to configure them carefully and, ideally, add external constraints:

 1  systemd-run --scope -p MemoryLimit=100M /path/ucsb 

Now a question. Let’s say you want to mmap files and be done. Anyways, Linux can do a far better job at managing caches than most DBs. In that case - the memory usage will always be very high but within the limits of that process. As soon as we near the limit - the OS will drop the old caches. Is it better to use the least RAM or the most RAM until the limit?

For our cloud-first offering, we will favour the second option. It will give the users the most value for their money on single-purpose instances.

Furthermore, we allow and enable “Workload Isolation” in UCSB by default. It will create a separate process and a separate address space for each workload of each DB. Between this, we flush the whole system. The caches filled during insertions benchmarks, will be invalidated before the reads begin. This will make the numbers more reliable but limits concurrent benchmarks to one.

In one of the next articles we will write about the in-hardware Memory Management Unit and the Linux mmap implementation, so subscribe 🤗

### Dataset Size & NAND Modes

Large capacity SSDs store multiple bits per cell. If you are buying a Quad Level Cell SSD, you expect each of them to store 4 bits of relevant information. That may be a false expectation.

The SSD can switch to SLC mode during intensive writes, where IO is faster, especially if a lot of space is available. In the case of an 8 TB SSD, before we reach 2 TB used space, all NAND arrays can, in theory, be populated with just one relevant bit.

If you are benchmarking the DBMS, not the SSD, ensure that you did all benchmarks within the same mode. In our case for a 1 TB workload on 8 TB drives, it’s either:

• starting with an empty drive,
• starting with an 80% full drive.

## Listing the Knobs

As you see, there is a lot to take into account, and everyone may be interested in a different setting. To sum things up, here is a functionality comparison between YCSB and UCSB.

Present in YCSB Present in UCSB
Size of the dataset
DB configuration files
Tracking hardware usage
Concurrency
Batch Operations
Bulk Operations
Support of Transactions

There is too much control-flow to tune, so instead of 1'000 CLI arguments, we organize them into a run.py Python script.

## Results 💥

And then, what’s the point of writing a benchmark if you don’t get to run it! One of the comments to our previous post was wondering, why run small workloads on big machines? The answer is - to upscale the experiment within the same environment and analyze its scaling behaviour.

How we set the Knobs this time:

• Transactions: ❌
• Concurrent: ❌
• Sizes: 10 GB, 100 GB, 1 TB.

Here is how long one iteration of the benchmark takes:

• 10 GB: 42 minutes.
• 100 GB: 5 hours, 54 minutes.
• 1 TB: 2 days, 10 hours, 6 minutes.

Totalling at 2 days, 16 hours, 42 minutes. Benchmark duration by DBMS:

• WiredTiger: 8 hours, 16 minutes.
• LevelDB: 1 day, 15 hours, 19 minutes.
• RocksDB: 12 hours, 8 minutes.
• UnumDB: 4 hours, 59 minutes.

We rerun those benchmarks many times with different settings. Every DBMS received its own 8 TB Samsung SSD, empty. Later we will avoid slower DBs and will focus on bigger setups, that don’t fit on one SSD:

• Transactions: ✅
• Concurrent: ✅
• Sizes: 10 TB, 50 TB.

The 🥈 and 🥉 place often change, but the 🥇 leader remains constant. With performance difference often being 2x - 5x against the second-best solution in each workload. Sometimes, this speed comes at the cost of using more RAM, but not always. If the gap between 🥇 and 🥈 is bigger than the gap within any consecutive pair of entries in the leaderboard - we mark the result with 🏅.

### 0: Bulk Initialization

Initializing the KVS is done via monotonically ascending keys. The original YCSB initialization always happens one key at a time. We went one step further and implemented a bulk insert functionality. When possible, it constructs big DB-compatible files externally and then submits them into KVS. WiredTiger, RocksDB and UnumDB natively support that, but LevelDB doesn’t.

This is vastly faster than inserting data one by one and even in batches! Think of it as your zero-to-hero time: how fast can you import all your Parquet files from S3 buckets, before starting working with them.

Brand CPU usage RAM Usage Disk Usage Speed
WiredTiger ⊃ MongoDB 1.00 cores 3.99 GiB 989.77 GiB 1.1M 🥈
LevelDB ⊃ Google 0.99 cores 904.12 MiB 984.19 GiB 30.2K
RocksDB ⊃ Facebook 0.89 cores 2.65 GiB 976.27 GiB 385.3K 🥉
UnumDB ⊃ Unum 1.00 cores 3.90 GiB 968.58 GiB 2.2M 🥇🏅

Every operation is a random single-element operation. Half of them are reads, and half are updates for existing keys.

Brand CPU usage RAM Usage Disk Usage Speed
WiredTiger ⊃ MongoDB 3.19 cores 23.67 GiB 1.02 TiB 101.1K 🥉
LevelDB ⊃ Google 1.78 cores 2.54 GiB 989.39 GiB 115.2K 🥈
RocksDB ⊃ Facebook 1.50 cores 4.95 GiB 978.55 GiB 86.8K
UnumDB ⊃ Unum 1.00 cores 10.01 GiB 976.32 GiB 197.3K 🥇🏅

Again, not a particularly interesting benchmark, but a common case when dealing with poorly optimized software. Use the batched approach whenever possible.

Brand CPU usage RAM Usage Disk Usage Speed
WiredTiger ⊃ MongoDB 1.39 cores 17.43 GiB 989.77 GiB 146.6K 🥉
LevelDB ⊃ Google 1.12 cores 105.41 MiB 984.40 GiB 30.6K
RocksDB ⊃ Facebook 0.97 cores 4.72 GiB 976.27 GiB 160.7K 🥈
UnumDB ⊃ Unum 0.98 cores 1.42 GiB 972.30 GiB 175.7K 🥇

### D: 95% Random Reads + 5% Random Inserts

Unlike A this benchmark inserts new key-value pairs instead of updating the previously existing ones.

Brand CPU usage RAM Usage Disk Usage Speed
WiredTiger ⊃ MongoDB 2.03 cores 18.48 GiB 1.02 TiB 182.4K 🥈
LevelDB ⊃ Google 1.85 cores 232.08 MiB 1.01 TiB 20.1K
RocksDB ⊃ Facebook 0.99 cores 5.04 GiB 1.00 TiB 173.3K 🥉
UnumDB ⊃ Unum 1.01 cores 17.71 GiB 1022.78 GiB 189.9K 🥇

### E: Range Select

Here we randomly select a key and then retrieve the following 100 values. One can easily change the scan length through settings and even define it through provided probability distributions.

Brand CPU usage RAM Usage Disk Usage Speed
WiredTiger ⊃ MongoDB 0.38 cores 4.54 GiB 989.77 GiB 250.2K 🥈
LevelDB ⊃ Google 0.32 cores 50.79 MiB 983.99 GiB 236.3K 🥉
RocksDB ⊃ Facebook 0.34 cores 4.72 GiB 976.27 GiB 177.3K
UnumDB ⊃ Unum 0.52 cores 344.49 MiB 972.30 GiB 384.3K 🥇🏅

It is a benchmark of “batch selections”, where instead of submitting one read operation at a time and waiting for it synchronously, you request a batch of, let’s say, 256 random keys. It enables the DBMS to execute them in a batch-asynchronous fashion, reordering separate operations internally for speed.

This workload is significant for BI and analytical workloads, network/graph analysis being a perfect example. At every step of your dataset exploration, you are fetching data from very different parts of your dataset to provide insights, and this requires fast “batch selections”!

Brand CPU usage RAM Usage Disk Usage Speed
WiredTiger ⊃ MongoDB 0.50 cores 14.29 GiB 989.77 GiB 40.9K 🥉
LevelDB ⊃ Google 0.64 cores 55.87 MiB 983.99 GiB 16.5K
RocksDB ⊃ Facebook 0.35 cores 9.07 GiB 976.27 GiB 51.8K 🥈
UnumDB ⊃ Unum 0.37 cores 830.48 MiB 972.30 GiB 303.8K 🥇🏅

Moreover, today AI researchers mostly train their neural networks after uniformly shuffling the datasets. It is a simple approach, but is it the best strategy if we can rapidly sample batches?

### Y: Batch Insert

Instead of inserting one value at a time, like in C, we submit batches of upto 10'000 values, depending on the DB size. It’s natively supported only by LevelDB and UnumDB.

This is different from a transaction with 256 new entries in it, as if one of the operations fails, we want the remaining to proceed. Transactions add the rollback functionality and will be the subject of future publications.

Brand CPU usage RAM Usage Disk Usage Speed
WiredTiger ⊃ MongoDB 1.00 cores 227.22 MiB 1.02 TiB 450.3K 🥉
LevelDB ⊃ Google 0.99 cores 602.71 MiB 1.11 TiB 41.1K
RocksDB ⊃ Facebook 1.71 cores 5.63 GiB 1.10 TiB 690.2K 🥈
UnumDB ⊃ Unum 1.00 cores 2.38 GiB 1.04 TiB 813.2K 🥇

### Z: Scan

It streams all the data present in the store, which may be later sampled and forwarded into some neural-network training procedure.

Brand CPU usage RAM Usage Disk Usage Speed
WiredTiger ⊃ MongoDB 1.22 cores 16.80 GiB 989.77 GiB 1.4M 🥈
LevelDB ⊃ Google 0.59 cores 7.36 GiB 983.99 GiB 1.2M 🥉
RocksDB ⊃ Facebook 0.14 cores 4.72 GiB 976.27 GiB 84.5K
UnumDB ⊃ Unum 0.34 cores 16.89 GiB 968.58 GiB 1.7M 🥇

When comparing LevelDB and RocksDB, we see their rivalry. One is a lot better for sequential scan-like workloads. In contrast, the other one is better at randomized operations. The most astonishing result for UnumDB - it is better at both! WiredTiger often also performs well, but almost always uses the most energy and memory to get there.

Brand 0 A C D E X Y Z
WiredTiger 🥈 🥉 🥉 🥈 🥈 🥉 🥉 🥈
LevelDB 🥈 🥉 🥉
RocksDB 🥉 🥈 🥉 🥈 🥈
UnumDB 🥇🏅 🥇🏅 🥇 🥇 🥇🏅 🥇🏅 🥇 🥇

This may not sound relevant, but it is. Most companies operate 2 data stores separately - one for small real-time transactions and another for big analytical queries. This means more engineers, complex synchronization and outdated analytics. We hope to replace that entire mess with a single solution.

## Is this the limit? ⚡

Far from it! We run those benchmarks with our hands essentially cuffed. Here is a broader picture:

• when put in a Docker container, most KVS slow down by 15%, UnumDB - by around 7%.
• on ARM, the gap between UnumDB and the second-best solution is generally more prominent than on x86 due to our adoption of the Neon instruction family.
• multithreaded and transactional operations are again a lot faster, as we invest heavily into concurrent in-memory data structures far more performant and memory-efficient than a concurrent Skip-List can be.
• initialization order has unprecedented affect on the speed of measured operations - when the keys aren’t mononically rising other DBs can often be 2x-5x slower.
• with GPUs enabled, the gap becomes astronomical, but it’s not a fair comparison, as GPUs would draw more power. 400W per GPU vs 300W when fully loading a 64-core CPU.

So prepare to see broader benchmarks soon! Anyways, our KVS performance seems solid for now, and we are focusing on:

• horizontal scaling and cloud deployment,
• going from string values to a higher-level type system,
• supporting arbitrary JSON-like inputs,
• an SQL-compatibility layer,
• bindings for dynamic scripting languages.

It might seem too ambitious for a side-project of an AI-oriented team, but we are not stopping any time soon! It’s not just about examining more data faster. It’s also about studying the same data efficiently. In 2020 alone:

• data transmission networks consumed 260-340 TWh.
• data centre electricity use was 200-250 TWh.
• cryptocurrency mining consumed ~100 TWh.

Totalling at 600-700 TWh, or what the entire 1.26 Billion population of Africa consumed in that same year. Or over 100x the energy consumption of our home country.

With that in mind, we invite everyone to think twice about the tools we use. At least if we are genuinely committed to making the world a better place through software. It takes only a couple of search queries to realize that CuPy can be 1'000x faster than NumPy. Similarly, other startups modernize the streaming layer, like RedPanda replacing Apache Kafka. We have already announced our non-DBMS research directions for 2022, but all the solutions we are releasing this year target the storage layer. Any computer takes at least 100 microseconds to fetch anything from a fast SSD, meaning that accessing data can be over 200'000x more expensive than processing it. It’s the slowest piece of the modern computer, so our UnumDB should impact millions of applications widening their most critical bottleneck!

So far, I have been personally spending millions on our research since 2015. Offices, high-end servers, R&D teams, you name it 😋 Now, we are expanding further, attracting brilliant, inspiring researchers and organizing more technical conferences than ever. What previously felt like an impossible dream of making 🇦🇲 the second 🇮🇱, almost feels like a passed step! If you want to do research with us, join C++ conferences as a speaker or invest, just let us know! Anyways, who would need Oracle or MongoDB once there is UnumDB? 😉