2021 was terrific for DataBase Management Software and startups in general. While classical SQL is shrinking, the data-management market as a whole is booming at 17% CAGR and will reach $150 Billion in 2026, according to Gartner. That and the hype allowed dozens of DBMS startups to raise more capital last year alone than in their entire preceding decade-long history. For 13 companies in our previous extended comparison, it meant swallowing $4.
Very few consider C++ attractive, and hardly anyone thinks it’s easy. Choosing it for a project generally means you care about the performance of your code. And rightly so! Today machines can process hundreds of Gigabytes per second, and we, as developers, should all learn to saturate those capabilities. So let’s look into a few simple code snippets and familiarize ourselves with Google Benchmark (GB) - the most famous library in the space.
A bit of history. Not so long ago, we tried to use GPU acceleration from Python. We benchmarked NumPy vs CuPy in the most common number-crunching tasks. We took the highest-end desktop CPU and the highest-end desktop GPU and put them to the test. The GPU, expectedly, won, but not just in Matrix Multiplications. Sorting arrays, finding medians, and even simple accumulation was vastly faster. So we implemented multiple algorithms for parallel reductions in C++ and CUDA, just to compare efficiency.
Last time we realized how easy GPU acceleration can be for Python users. Follow CUDA installtion steps carefully, replace import numpy as np with import cupy as np, and you will often get the 100x performance boosts without breaking a sweat. Every time you write magical one-liners, remember a systems engineer is making your dreams come true. Let’s take a short break from our Python series and see what it’s like!
As this is the first article in the series, let’s start with a somewhat generic introduction. For raw benchmarks, jump here. Intro to CuPy Many groups benefit from GPUs: gamers, miners, AI researchers…, but surprisingly few can program them. For a good reason - complexity! On CPUs, everything is easy. Most of the code is portable. Threads rarely communicate with each other. And when they do, there is only a few that you need to synchronize.
We haven’t published any articles this year yet, as we wanted to start with something big. So here we begin a series of posts and live meetups on General-Purpose Computing on Graphics Processing Units. Or, GPGPU, for short. Contents The intention was to benchmark the most commonly used libraries for datascience and suggest most relevant tools for 2022. Here is what we came up with: NumPy vs CuPy, here. Deeper investigation in the C++/CUDA land: 879 GB/s Reductions in C++ and CUDA, here.
I recently read an article by Andy Pavlo, one of the most famous people in the database world, reflecting on the database market in 2021. He calls today - “the golden age of DataBase Management Software” (DBMS), and he is right! In one year, many of the startups in the space have raised more than in their entire previous decade-long history! When you see separate news on yet another DBMS startup, their rounds and their promises - you read it, you agree with it, and you forget it.
This will be a story about many things: about computers, about their (memory) speed limits, about very specific workloads that can push computers to those limits and the subtle differences in Hash-Tables (HT) designs. But before we get in, here is a glimpse of what we are about to see. A friendly warning, the following article contains many technical terms and is intended for somewhat technical and hopefully curious readers.
A single software company can spend over 💲10 Billion/year, on data centres, but not every year is the same. When all stars align, we see bursts of new technologies reaching the market simultaneously, thus restarting the purchasing super-cycle. 2022 will be just that, so let’s jump a couple of quarters ahead and see what’s on the shopping list of your favourite hyperscaler! Friendly warning: this article is full of technical terms and jargon, so it may be hard to read if you don’t write code or haven’t assembled computers before.
Read the newest version here! Everyone in the tech industry has probably heard the words “data lake”, “data warehouse”, “database management software”, but have you ever wondered how all of that is built? According to Gartner, the DBMS market size was estimated to be 💲65 Billion in 2020 and will reach 💲150 Billion in 2025. This article will be about the technology that underpins that whole industry. About how modern IT infrastructure is created.