Why big companies use Mnesia instead of using Riak or CouchDB

Question

I can see 2 big companies like Klarna and Whatsapp are using Mnesia as their in memory database (not sure how they persist data with Mnesia with 2GB limit). My question is: why companies like those, and may be more I don't know, use Mnesia instead of Riak or couchDB, both are Erlang, where both databases support faster in memory databases, better painless persistence, and much more features. Do I miss something here?

Actually Klarna uses both Mnesia and Riak. – joaomilho Jun 22 '14 at 23:00 — joaomilho, Jun 22 '14 at 23:00

score 43 · Accepted Answer · answered Apr 20 '14 at 14:06

43

You are missing a number of important points:

First of all, mnesia has no 2 gigabyte limit. It is limited on a 32bit architecture, but hardly any are present anymore for real work. And on 64bit, you are not limited to 2 gigabyte. I have seen databases on the order of several hundred gigabytes. The only problem is the initial start-up time for those.

Mnesia is built to handle:

Very low latency K/V lookup, not necessarily linearizible.
Proper transactions with linearizible changes (C in the CAP theorem). These are allowed to run at a much worse latency as they are expected to be relatively rare.
On-line schema change
Survival even if nodes fail in a cluster (where cluster is smallish, say 10-50 machines at most)

The design is such that you avoid a separate process since data is in the Erlang system already. You have QLC for datalog-like queries. And you have the ability to store any Erlang term.

Mnesia fares well if the above is what you need. Its limits are:

You can't get a machine with more than 2 terabytes of memory. And loading 2 teras from scratch is going to be slow.
Since it is a CP system and not an AP system, the loss of nodes requires manual intervention. You may not need transactions as well. You might also want to be able to seamlessly add more nodes to the system and so on. For this, Riak is a better choice.
It uses optimistic locking which gives trouble if many processes tries to access the same row in a transaction.

My normal goto-trick is to start out with Mnesia in Erlang-systems and then switch over to another system as the data size grows. If data sizes grows slowly, then you can keep everything in memory in Mnesia and get up and running extremely quickly.

answered Apr 20 '14 at 14:06

I GIVE CRAP ANSWERS

18,739
3
42
47

1

Oh, and look up the old mnesia system description paper which explains exactly what it was built for! – I GIVE CRAP ANSWERS Apr 20 '14 at 14:07
Very good explanation .. From what you said above that engineers generally prefer to use data in the same process rather than jumping to another process (Riak provide a very low latency in memory database written in Erlang and can persist to hard disk), my question is: did you see any company uses Mnesia for disk persistence larger than 4GB?? I still can find it hard to imagine how Mnesia is good in persistence. – securecurve Apr 20 '14 at 20:20
After reading your great answer, Mnesia is great for in memory db (backed with ETS), my question in short: how to make use of Mnesia for a reliable persistence, like Riak or CouchDB? – securecurve Apr 20 '14 at 20:28
Mnesia has disc copies. Wouldn't that suffice? – Apr 21 '14 at 06:03
kadaj: the question is: how big it can take, and how quick it can recover from a failure in case of big tables. – securecurve Apr 21 '14 at 07:36
It has to read the disk copy into memory before bringing that table on-line. This takes O(n) time given that n is the number of bytes stored. The way you usually avoid this is by having multiple machines ready to serve the same data set, which plugs the reboot hole unless the cluster as a whole is down. – I GIVE CRAP ANSWERS Apr 21 '14 at 20:48
@IGIVECRAPANSWERS , Is that practical/production-friendly?? Do you know any company uses Mnesia (in producion) with disk nodes bigger than 2GB (or 4GB)?? – securecurve Apr 21 '14 at 21:30
@securecurve - We do at my company. We have one table that crosses the 4GB boundary and several others that come close. When the network partitions or a server goes down, it's not pretty and it can take a bit of time and effort to recover, but it does recover. "Production friendly" is a relative term; mnesia suffices for us, but we also don't have many instances of network partitioning or crashing servers. – Soup d'Campbells Mar 31 '15 at 20:25

score 10 · Answer 2 · edited May 23 '17 at 12:01

10

As for persistent storage capacity for mnesia, "the 2 gb limit for disk tables" is a common delusion. Read this post What is the storage capacity of a Mnesia database?

very attentively. There are no actual limits for mnesia disk table size.

Mnesia is free unlike riak(for commercial usage).
Read about cap theorem. You can build your own ca or cp or ap database using plain mnesia as a backend. But if you take a particular dbms, say couchdb, it is designed to be ap out of box. And you cant make it, say , ca(as far as I know)

edited May 23 '17 at 12:01

Community

1
1

answered Apr 20 '14 at 15:07

Oleksandr Khryplyvenko

868
1
8
16

Please note that storing data on Disk above 4GB is reported to cause problems, but Mnesia will not fail; and I didn't hear about anyone using Mnesia over 4GB in productions systems, did you?? – securecurve Apr 20 '14 at 19:55
Well. To be exact then Riak IS free for commercial usage. There is however and enterprise version available that adds multi-datacenter replication. – Jon Gretar Apr 21 '14 at 02:34
@securecurve , personally - I didn't. – Oleksandr Khryplyvenko Apr 21 '14 at 06:52
2

You can't make any database CA. CP or AP (or, frequently, effectively neither) are your only options. – macintux Apr 21 '14 at 12:20
@macintux. The CAP concept is a bit new to me, why I can't make CA systems? – securecurve Apr 22 '14 at 16:57
1

@securecurve http://codahale.com/you-cant-sacrifice-partition-tolerance/ – macintux Apr 23 '14 at 15:03
Riak has been Apache 2 licensed since 2013 http://basho.com/riak-cs-is-now-open-source/ – Ashley Dec 12 '14 at 15:49
@securecurve "Please note that storing data on Disk above 4GB is reported to cause problems" <- you are probably mixing up disk_only_copies and disk_copies. The first are stored in DETS which has this limit, however disk_copies are store log based using http://erlang.org/doc/man/disk_log.html which has no limits – Peer Stritzinger Sep 11 '17 at 15:52

Jan Lehnardt · Answer 3 · 2014-04-21T11:36:24.163

5

As far as I can tell, ~~neither Riak nor~~ (See note about BitCask in the comments) CouchDB support in-memory databases. I could be wrong on Riak, but I work on CouchDB, so I am very sure.

Engineers are choosing mnesia over Riak or CouchDB because it solves a different problem.

Whether they are big companies is no factor in this.

edited Apr 21 '14 at 11:36

answered Apr 20 '14 at 13:26

Jan Lehnardt

2,619
1
17
14

1

You are right actually, Riak doesn't support in-memory database, but what type of problems does Mnesia solve (other than in-memory db) that Riak and CouchDB don't .. keeping in mind the storage limitations (regardless of the debate of 2GB storage or unlimited storage) and taking too long time to load – securecurve Apr 20 '14 at 19:52
You are not totally right, there is a backend used by Riak called Bitcask, it allows you to do in memory storage with a very low latency on access time, you can check this: http://basho.com/hello-bitcask/ – securecurve Apr 20 '14 at 20:15
Riak most certainly supports in-memory via [the memory backend](http://docs.basho.com/riak/latest/ops/advanced/backends/memory/). Bitcask is disk-based storage backend with the *keys* in memory, relying on the OS disk cache for values. – Brian Roach Apr 21 '14 at 02:44
Brian Roach: true, I just figured that out, the in memory storage in Riak is also based on ETS tables as per the link you provided. – securecurve Apr 21 '14 at 07:40

Why big companies use Mnesia instead of using Riak or CouchDB

3 Answers3