Over the past six years, our team has worked our way through
thousands of recent scientific papers and performed millions of internal
benchmarks to build what is now,
The Most Efficient Data-Processing Software Ecosystem Ever Built!
UnumDB is a persistent distributed transactional ACID NoSQL DBMS.
Available as Managed DataBase Software as a Service.
Faster than other databases in both read and write operations -
it's the only datalake software your business will ever need!
Unum Kernel Library is our compute engine. It replaces BLAS and powers
everything from basic statistical queries to advanced Deep Learning routines.
It analyzes insane volumes of data and can integrate with PyTorch,
TensorFlow, Apache Kafka and most other third-party tools.
UnumAI is a suite of our pre-trained State-of-the-Art Deep Neural Networks
available via REST API.
Training a network like GPT-3 can cost over $25 Million and requires
hundreds of terabytes of learning data. We went through the pain of baking them,
so you don't have to!
YCSB has been the industry standard Key-Value Store evaluation benchmark since 2010. It simulates the most common DB operations and was used to evaluate MongoDB, ElasticSearch, SnowFlake, Oracle NoSQL Database, Postgres and Redis, among other household names.
SNAP is the go-to place for everyone analyzing graphs and networks. It contains a broad set of massive datasets, including social networks, citation networks, logistical networks, brain networks, gene-expression graphs, online reviews, user interactions and many more. For this case study, we picked the US Patent citation network.
Since 2009 Uber has released billions of data points about Taxi and Uber rides in New York City. This helps data scientists better understand the transportation market and explore the city's neighbourhoods, nightlife, and traffic from an entirely new angle.
Given the sheer size of the dataset, it became a popular target among High-Performance engineers. Today, only purely analytical databases like BrytlytDB and OmniSci shine in those workloads. Unfortunately, their universal solution for every problem is to throw more money on servers for horizontal scaling. UnumDB is different.
Subscribe To Be Notified About Our ResultsTwitter is arguably the best social network for data mining. Loved and used by developers all over the planet, everyone wants to access their API to analyze the social dynamics, sentiment around polarising topics and even to predict future stock prices!
We share the love for that dataset and consider it a perfect example of real-world web-application data. Every Tweet is a hierarchical JSON document with roughly 120 nested fields in it. It includes strings, integers, real numbers, geo coordinates, hash-tags, UTF-8 codes and other fuzzy components, making it a common accessible and fair benchmarking choice!
Subscribe To Be Notified About Our ResultsCommonCrawl is by far the biggest publicly available database in history. It's a dump of billions of HTML web pages scraped from all over the internet. Today that dataset is used to train the biggest Transformer Neural Networks for Natural Language Processing. GPT-3, the most famous model making headlines since 2020, was trained on just a fraction of that data!
GPT-3 looked at only 410 Billion tokens from CommonCrawl, yet its training is estimated to cost over $25 Million. It required 285,000 CPU cores, 10,000 GPUs and absurd networking bandwidth across the training cluster. By integrating UnumDB with UnumKL, we can significantly reduce training costs, simultaneously accelerating the entire process.
Subscribe To Be Notified About Our ResultsNo matter what kind of Deep Learning you do, your intermediate representations live in some high-dimensional space. Every point there is a vector, and you must be able to search for its nearest neighbours as fast as possible. That's how modern Semantic Search works at Google, Bing, Yandex, Baidu and everywhere else.
This kind of functionality is crucial for building intelligent services. That's why Facebook has made FAISS. Unfortunately, their system stores the indexes in memory and can only use GPUs for quantization, not the whole pipeline. This limits the applicability to Billion point clusters, or more realistically, Million point clusters. UnumDB doesn't have those bottlenecks!
Subscribe To Be Notified About Our ResultsUnumDB is a persistent distributed NoSQL DBMS. It's distributed in all shapes and sizes, perfectly tailored for most common workloads. Still, all of them preserve the following properties.
UnumDB bridges the gap between on-disk Big-Data processing and in-memory High-Throughput processing. We
are capable of processing 10x to 100x more data than other solutions without classical read-write
tradeoffs.
Your SSD streams 4 GB/s, but your DBMS bottlenecks it down to 10 MB/s.
Stop overpaying.
UnumDB can efficiently manage over a PetaByte of textual, graphical, or other data on a single server.
Where competitors need a fleet of machines, we need just a few.
The math is simple:
3x fewer servers,
means 3x lower costs!
Somehow, most DBMS brands still focus on ancient SQL-like text-based interfaces. It's not 1990 anymore,
so we move on to Python, the most popular programming language on the planet!
Our SDK mimics the most popular Python libraries.
You don't have to learn how to use our tools, you already know!
No matter where your data resides, we will help it find its way home to UnumDB.
However, every standard exchange format is supported once you need to sample it or send it somewhere.
Export to Apache Parquet, JSON, CSV or XML formats
or enable Apache Arrow for zero-copy in-memory transfers!
Our projects have a deeply scientific core, but we also value simplicity! You don't have to know how a nuclear power plant works to use electricity, but if you are still curious - Read Our Blog!
Fully managed, global cloud database on AWS, Azure, and GCP. Available across 100 availability zones all around the planet.
For small serverless applications.
Minimal configuration required.
For big production applications.
Advanced configuration controls.
For learning and exploring UnumDB.
Basic configuration options.