Which in memory database?

Question

Debbie Panduru · Answer

In-memory databases are purpose-built databases that rely primarily on memory for data storage, in contrast to databases that store data on disk or SSDs.

Cunningham Burkey · Answer

An in-memory database (IMDB) stores computer data in a computer’s main memory instead of a disk drive to produce quicker response times. Accessing data stored in memory eliminates the time needed to query data from a disk. In-memory databases are used by applications that depend on rapid response times and real-time data management. Industries that benefit from in-memory databases include telecommunications, banking, travel and gaming. An in-memory database is also referred to as a main memory database (MMDB), real-time database (RTDB) or in-memory database system (IMDS).

‍

An in-memory database keeps all its data in the random access memory (RAM) of a computer. Only the main memory is accessed when querying data. This allows for faster access of that data than a disk-based system.

The downside is the volatility of RAM. The data is lost when an in-memory database crashes. The development of non-volatile random access memory (NVRAM) can help in-memory databases maintain data after a power loss or crash. Flash is one example, but a major drawback is the limit to how many times flash memory can be erased and rewritten. NVRAM chips are being developed that provide a more persistent memory than flash.

Data storage in an in-memory database relies on a computer’s random access memory (RAM) or main memory instead of traditional disk drives. Data is loaded into an in-memory database in a compressed and non-relational format. The data is in a directly usable format without the barrier of compression or encryption. It allows for direct navigation from index to row or column and is a read-only system.

The speed of an in-memory database is made possible by lack of translation and caching. The data is used in the same form as the application that contains it. Data access is managed by an in-memory database management system.

An in-memory database system can also act as an read-only analytic database that stores historical data on metrics for business intelligence (BI) applications. This eliminates data indexing, which can reduce IT costs. Multi-core servers, 64-bit computing and lower RAM prices have made in-memory analytics more common.

Applications that manage vast quantities of data and require rapid response times can benefit from in-memory database architecture. The data analytics industry increasingly relies on in-memory database systems.

The benefits of an in-memory database include:

In-memory databases are commonly used for:

An in-memory database comparison to a traditional disk database includes speed, volume and volatility. An in-memory database keeps all data in the main memory or RAM of a computer. A traditional database retrieves data from disk drives.

In-memory databases are faster than traditional databases because they require fewer CPU instructions. They also eliminate the time it takes to access data from a disk.

In-memory databases are more volatile than traditional databases because data is lost when there is a loss of power or the computer’s RAM crashes. Data can be more easily restored from the disks of a traditional database.

Traditional databases are formatted by the disk drives on which the data is read and written. When one part of a traditional database refers to another part, a different block must be read from the disk. With an in-memory database, different parts of the database can be managed through direct pointers.

In-memory databases allow for real-time analysis and reporting of data.

Traditional databases store redundant data as the system creates a copy of the data for each component added to the system.

To avoid the risk of losing data in a power outage or a computer crash, enhance an in-memory database with non-volatile random access memory (NVRAM). Flash is commonly used, despite its high cost and limitation in number of times memory can be rewritten. This can help in-memory databases maintain data after a power loss or crash. Flash is one example, but a major drawback is the limit to how many times flash memory can be erased and rewritten.

Reading a database from flash is faster than using a disk drive. That is why NVRAM chips are being developed that provide a more persistent memory than flash.

Monalisa Roy · Answer

Hey guys.

Probably you’ve heard about in-memory databases. If not, then check out a quick overview of what they are: https://en.wikipedia.org/wiki/In-memory_database

To make the long story short, an in-memory database is a database that keeps the whole dataset in RAM. What does that mean? It means that each time you query a database or update data in a database, you only access the main memory. So, there’s no disk involved into these operations. And this is good, because the main memory is way faster than any disk. A good example of such a database is Memcached.

But wait a minute, how would you recover your data after a machine with an in-memory database reboots or crashes? Well, with just an in-memory database, there’s no way out. A machine is down — the data is lost. Forget about it.

Is it possible to combine the power of in-memory data storage and the durability of good old databases like MySQL or Postgres? Sure! Would it affect the performance? Surprisingly, it won’t!

Here come in-memory databases with persistence like Redis, Aerospike, Tarantool.

You may ask: how can in-memory storage be persistent? The trick here is that you still keep everything in memory, but additionally you persist each operation on disk in a transaction log. Look at the picture:

The first thing that you may notice is that even though your fast and nice in-memory database has got persistence now, queries don’t slow down, because they still hit only the main memory like they did with just an in-memory database. Good news! :-) But what about updates? Each update (or let’s name it a transaction) should not only be applied to memory but also persisted on disk. A slow disk. Is it a problem? Let’s look at the picture:

Transactions are applied to the transaction log in an append-only way. What is so good about that? When addressed in this append-only manner, disks are pretty fast. If we’re talking about spinning magnetic hard disk drives (HDD), they can write to the end of a file as fast as 100 Mbytes per second. If you don’t believe me, then try to run this test in a Unix/Linux/MacOS shell:

cat /dev/zero >some_file

and check out how fast some_file is growing. So, magnetic disks are pretty fast when you use them sequentially. On the other hand, they’re utterly slow when you use them randomly. They can normally complete around 100 random operations per second.

If you write byte-by-byte, each byte put in a random place of an HDD, you can see some real 100 bytes per second as the peak throughput of the disk in this scenario. Again, it is as little as 100 bytes per second! This tremendous 6-order-of-magnitude difference between the worst case scenario (100 bytes per second) and the best case scenario (100,000,000 bytes per second) of disk access speed is based on the fact that, in order to seek a random sector on disk, a physical movement of a disk head has occurred, while you don’t need it for sequential access as you just read data from disk as it spins, with a disk head being stable.

If we consider solid-state drives (SSD), then the situation will be better because of no moving parts. But still the best/worst ratio is similar — sequential access gives 200–300 Mbytes per second, and random access gives 1000–10000 seeks per second — that is 4–5 orders of magnitude. Here are some numbers: https://en.wikipedia.org/wiki/IOPS.

So, what our in-memory database does is it floods the disk with transactions as fast as 100 Mbytes per second. Is that fast enough? Well, that’s real fast. Say, if a transaction size is 100 bytes, then this will be one million transactions per second! This number is so high that you can definitely be sure that the disk will never be a bottleneck for your in-memory database. Here is a detailed benchmark of one in-memory database showing one million transactions per second, where the bottleneck is the CPU, not the disk: https://gist.github.com/danikin/a5ddc6fe0cedc6257853.

To summarize all that was said above about disks and in-memory databases:

Why wouldn’t regular disk-based databases adopt the same techniques? Well, first, unlike in-memory databases, they need to read data from disk on each query (let’s forget about caching for a minute, this is going to be a topic for another article).

You never know what the next query will be, so you can consider that queries generate random access workload on a disk, which is, remember, the worst scenario of disk usage. Second, disk-based databases need to persist changes in such a way that the changed data could be immediately read.

Unlike in-memory databases, which usually don’t read from disk unless for recovery reasons on starting up. So, disk-based databases require specific data structures to avoid a full scan of a transaction log in order to read from a dataset fast. One such data structure of the kind is a B/B+ tree. The flip side of this data structure is that you should change a B/B+ tree on each change operation, which could constitute random workload on a disk. While improving the performance for read operations, B/B+ trees are degrading it for write operations. There is a handful of database engines based on B/B+ trees. These are InnoDB by MySQL or Postgres storage engine. There is also another data structure that is somewhat better in terms of write workload — LSM tree. This modern data structure doesn’t solve problems with random reads, but it partially solves problems with random writes. Examples of such engines are RocksDB, LevelDB or Vinyl. You can see a summary on this picture:

So, in-memory databases with persistence can be real fast on both read/write operations. I mean, as fast as pure in-memory databases, using a disk extremely efficiently and never making it a bottleneck.

P.S.

The last but not least topic that I want to partially cover here is snapshotting. Snapshotting is the way transaction logs are compacted. A snapshot of a database state is a copy of the whole dataset. A snapshot and latest transaction logs are enough to recover your database state. So, having a snapshot, you can delete all the outdated transaction logs that don’t have any new information on top of the snapshot.

Ask Sawal

Which in memory database?

Related Questions

More Questions

Contact