61

I'm looking to start using a key/value store for some side projects (mostly as a learning experience), but so many have popped up in the recent past that I've got no idea where to begin. Just listing from memory, I can think of:

  1. CouchDB
  2. MongoDB
  3. Riak
  4. Redis
  5. Tokyo Cabinet
  6. Berkeley DB
  7. Cassandra
  8. MemcacheDB

And I'm sure that there are more out there that have slipped through my search efforts. With all the information out there, it's hard to find solid comparisons between all of the competitors. My criteria and questions are:

  1. (Most Important) Which do you recommend, and why?
  2. Which one is the fastest?
  3. Which one is the most stable?
  4. Which one is the easiest to set up and install?
  5. Which ones have bindings for Python and/or Ruby?

Edit:
So far it looks like Redis is the best solution, but that's only because I've gotten one solid response (from ardsrk). I'm looking for more answers like his, because they point me in the direction of useful, quantitative information. Which Key-Value store do you use, and why?

Edit 2:
If anyone has experience with CouchDB, Riak, or MongoDB, I'd love to hear your experiences with them (and even more so if you can offer a comparative analysis of several of them)

Claudio
  • 10,614
  • 4
  • 31
  • 71
Mike Trpcic
  • 25,305
  • 8
  • 78
  • 114
  • If you consider Berkeley DB, search the web. There were a few reports of lost data in that system. – Sergey Mar 18 '10 at 13:16

15 Answers15

26

Which do you recommend, and why?

I recommend Redis. Why? Continue reading!!

Which one is the fastest?

I can't say whether it's the fastest. But Redis is fast. It's fast because it holds all the data in RAM. Recently, virtual memory feature was added but still all the keys stay in main memory with only rarely used values being swapped to disk.

Which one is the most stable?

Again, since I have no direct experience with the other key-value stores I can't compare. However, Redis is being used in production by many web applications like GitHub and Instagram, among many others.

Which one is the easiest to set up and install?

Redis is fairly easy to setup. Grab the source and on a Linux box run make install. This yields redis-server binary that you could put it on your path and start it.

redis-server binds to port 6379 by default. Have a look at redis.conf that comes with the source for more configuration and setup options.

Which ones have bindings for Python and/or Ruby?

Redis has excellent Ruby and Python support.

In response to Xorlev's comment below: Memcached is just a simple key-value store. Redis supports complex data types like lists, sets and sorted sets and at the same time provides a simple interface to these data types.

There is also make 32bit that makes all pointers only 32-bits in size even on 64 bit machines. This saves considerable memory on machines with less than 4GB of RAM.

Community
  • 1
  • 1
ardsrk
  • 2,407
  • 2
  • 21
  • 33
  • 6
    If you just want to hold everything in memory, I'd go with memcached. – Xorlev Mar 04 '10 at 05:43
  • This is the best answer so far. Redis looks awesome. – Mike Trpcic Mar 04 '10 at 16:53
  • @Mike: Salvatore, ( http://twitter.com/antirez ) the creator of Redis is very thoughtful and open about new ideas. Look at his tweet stream and you would know why redis is attracting more users and developers. – ardsrk Mar 04 '10 at 17:15
  • 1
    You can actually just run "make", no need to "make install". – Don Spaulding Mar 04 '10 at 17:44
  • 1
    @don: you are right. The other typically used option is `make noopt` to generate a binary without optimizations. This helps to run redis under gdb. – ardsrk Mar 04 '10 at 17:56
  • 1
    There is also `make 32bit` that makes all pointers only 32-bits in size even on 64 bit machines. This saves considerable memory on machines with less than 4GB of RAM. – ardsrk Mar 05 '10 at 10:39
  • I wish redis had any option to store stuff on disk, I have a dataset bigger than my memory :( – Umair Jabbar Jan 20 '13 at 23:31
  • as much as I tested, a mysql with a single kvs table even faster than redis. both of which has the default options. – Soyoes Dec 13 '13 at 11:43
24

You need to understand what modern NoSQL phenomenon is about.
It is not about key-value storages. They've been available for decades (BerkeleyDB for example). Why all the fuss now ?

It is not about fancy document or object oriented schemas and overcoming "impedance mismatch". Proponents of these features have been touting them for years and they got nowhere.

It is simply about adressing 3 technical problems: automatic (for maintainers) and transparent (for application developers) failover, sharding and replication. Thus you should ignore any trendy products that do not deliver on this front. These include Redis, MongoDB, CouchDB etc. And concentrate on truly distributed solutions like cassandra, riak etc.

Otherwise you'll loose all the good stuff sql gives you (adhoc queries, Crystal Reports for your boss, third party tools and libraries) and get nothing in return.

Vetle
  • 3,287
  • 2
  • 27
  • 28
Vagif Verdi
  • 4,816
  • 1
  • 26
  • 31
  • As mentioned in my comment on OverClocked's post, BigCouch brings automatic sharding, failover and replication to the CouchDB world. I believe MongoDB has sharding and replication, as well. – user359996 Oct 04 '10 at 04:13
  • Calling MongoDB trendy is IMHO a little harsh. It's easy, fast and reliable to get up and running with MongoDB, not to mention the gain in development speed and fun when switching from a traditional RDBMS. I wouldn't put any emphasis on sharding and replication for fun/side projects. And yes, MongoDB does have replication, sharding and automatic fail-over. – Matt Jun 23 '11 at 08:54
  • Although this was posted some time ago and things may have changed, but going to the http://www.mongodb.org/ reveals that auto-sharding is listed as a standard feature... – Sologoub Sep 14 '11 at 20:03
8

At this year's PyCon, Jeremy Edberg of Reddit gave a talk:

http://pycon.blip.tv/file/3257303/

He said that Reddit uses PostGres as a key-value store, presumably with a simple 2-column table; according to his talk it had benchmarked faster than any other key-value store they had tried. And, of course, it's very mature.

Ultimately, OverClocked is right; your use case determines the best store. But RDMBSs have long been (ab)used as key-value stores, and they can be very fast, too.

AdamKG
  • 13,678
  • 3
  • 38
  • 46
  • I saw that talk when it was posted on Reddit, but couldn't find any solid examples of using Postgres as a KVS in the way Reddit does. – Mike Trpcic Mar 04 '10 at 16:18
  • 2
    Start with `CREATE TABLE data (name varchar(40), value text);` and see how far you can go... – Kylotan Mar 06 '10 at 22:19
  • ^ Great comment. I'd gone beyond: stored objects with their types and attribute values. – Eduardo Jun 03 '11 at 00:45
7

They all have different features. And don't forget Project Voldemort which is actually used/tested by LinkedIn in their production before each release.

It's hard to compare. You have to ask yourself what you need: e.g. do you want partitioning? if so then some of them, like CouchDB, won't support it. Do you want erasure coding? Then most of them don't have that. Etc.

Berkeley DB is a very basic, low level storage engine, that perhaps can be excused from this discussion. Several key-value systems are built on top of it, to provide additional features like replication, versioning, coding, etc.

Also, what does your application need? Several of the solutions contain complexity that may not be necessary. E.g. if you just store static data that won't change, you can store them under data's SHA-1 content hash (i.e. use the content-hash as key). In this case, you don't have to worry about freshness, synchronization, versioning, and lots of complexities can be removed.

user
  • 5,335
  • 7
  • 47
  • 63
OverClocked
  • 1,177
  • 1
  • 11
  • 19
  • Note CouchDB now has Lounge and BigCouch for partitioning. The latter is based on Amazon's Dynamo clustering scheme, so you get all that fun variable durability, replication and quorum nonsense, as well. – user359996 Oct 04 '10 at 04:11
  • Berkeley DB isn't that basic and it integrates nicely with all languages and its performance is solid. –  Nov 21 '13 at 15:17
7

I've been playing with MongoDB and it has one thing that makes it perfect for my application, the ability to store complex Maps/Lists in the database directly. I have a large Map where each value is a list and I don't have to do anything special just to write and retrieve that without knowing all the different keys and list values. I don't know much about the other options but the speed and that ability make Mongo perfect for my application. Plus the Java driver is very simple to use.

MattGrommes
  • 11,974
  • 9
  • 37
  • 40
  • Is there any posts comparing MongoDB to CouchDB? Couch also allows you to have complex maps/lists in the database (As JavaScript functions), and I'm wondering which is the faster/more stable of the two. – Mike Trpcic Mar 04 '10 at 16:35
  • 1
    From a couple benchmarks that I have seen recently, Mongo is much, much faster than Couch, Mongo even beat out MySQL under certain conditions. – Astra Mar 04 '10 at 17:28
  • Reddit has had a couple of links about this: http://www.reddit.com/r/programming/comments/atnpb/what_are_the_merits_of_couchdb_over_mongodb_and/ http://jayant7k.blogspot.com/2009/08/document-oriented-data-stores.html – MattGrommes Mar 04 '10 at 17:52
  • 1
    Maybe you can clarify what you mean? Storing lists is trivial in any document database, not just MongoDB, no? – user359996 Oct 04 '10 at 04:15
6

One distinction you have to make is what will you use the DB for? Don't jump on board just because it's trendy. Do you need a key value store? or do you need a document based store? What is your memory footprint requirement? running it on a small VM or a separate one?

I recommend listing your requirements first and then seeing which ones overlap with your requirements.

With that said, I have used CouchDB/MongoDB and prefer to use MongoDB for its ease of setup and best transition from mysql style queries. I chose mongodb over sql because of dynamic schemas(no migration files!) and better data modeling(arrays, hashes). I did not evaluate based on scalability.

MongoMapper is a great MongoDB orm mapper for Ruby and there's already a working Rails 3 fork.

I listed some more details about why I prefered mongodb in my scribd slides http://tommy.chheng.com/index.php/2010/02/mongodb-for-natural-development/

tommy chheng
  • 9,108
  • 9
  • 55
  • 72
  • There are no requirements, I just want to LEARN, and getting opinions from people who have already learned is the best way to find a solid footing to make a jumping off point. – Mike Trpcic Mar 04 '10 at 16:52
  • i think learning by trying is a good start. i would suggest thinking of a sample app, say a twitter app, and try modeling the data architecture and queries in each of the respective languages. you don't even need to code, just see what the queries are like for "followers of followers", etc. this will give you insight into how easy it will be to use. – tommy chheng Mar 04 '10 at 17:44
  • This should have been the accepted answer. Do your research and select the best tool that fits your requirements! For session replication for example you can even look at tools like Hazelcast or Infinispan. – rustyx Mar 06 '13 at 10:11
6

I notice how everyone is confusing memcached with memcachedb. They are two different systems. The op asked about memcachedb.

memcached is memory storage. memcachedb uses Berkeley DB as its datastore.

drr
  • 143
  • 4
5

I only have experience with Berkeley DB, so I'll mention what I like about it.

  • It is fast
  • It is very mature and stable
  • It has outstanding documentation
  • It has C,C++,Java & C# bindings out of the box. Other language bindings are available. I believe Python comes with bindings as part of its "batteries".

The only downside I've run into is that the C# bindings are new and don't seem to support every feature.

Ferruccio
  • 98,941
  • 38
  • 226
  • 299
  • 1
    +1 for BDB. It scales well, fast enough for 1MB per sec kind of txn and very robust. – Jack Apr 11 '10 at 07:44
4

There is also zodb.

mikerobi
  • 20,527
  • 5
  • 46
  • 42
4

Which key value store is the most promising/stable?

G-WAN KV store looks rather promising:

DB engine            Traversal
-----------          ----------------------------
SQLite               0.261 ms  (b-tree)
Tokyo-Cabinet (TC)   4.188 ms  (hash table)
TC-FIXED             0.103 ms  (fixed-size array)
G-WAN KV             0.010 ms  (unamed)

Also, it is used internally by G-WAN webserver, known for its high concurrency performances (that's for the stability question).

Bert
  • 57
  • 1
  • 1
  • Finally some real benchmark. I would like to see a comparison a little bit more detailed (Berkeley, SQLite, Revoscaler, MongoDB, Couchbase, Cassandra, HBase, Kyoto Tycoon, Hypertable, Redis..). I need to know what's the best option to work on Windows with 1TB files to perform quick calculations on spatio-temporal series on a single machine. I prefer something not made with Java. – skan Apr 04 '13 at 13:29
  • 2
    Screw G-WAN and the snake oil it perpetuates. http://tomoconnor.eu/blogish/gwan-snakeoil-beware/#.UnK6-pTfxvY – Tom O'Connor Oct 31 '13 at 20:18
3

I really like memcached personally.

I use it on a couple of my sites and it's simple, fast, and easy. It really was just incredibly simple to use, the API is easy to use. It doesn't store anything on disk, thus the name memcached, so it's out if you're looking for a persistent storage engine.

Python has python-memcached.

I haven't used the Ruby client, but a quick Google search reveals RMemCache

If you just need a caching engine, memcached is the way to go. It's developed, it's stable, and it's bleedin' fast. There's a reason LiveJournal made it and Facebook develops it. It's in use at some of the largest sites out there to great effect. It scales extremely well.

Xorlev
  • 8,561
  • 3
  • 34
  • 36
2

Cassandra seems to be popular.

Cassandra is in use at Digg, Facebook, Twitter, Reddit, Rackspace, Cloudkick, Cisco, SimpleGeo, Ooyala, OpenX, and more companies that have large, active data sets. The largest production cluster has over 100 TB of data in over 150 machines.

yfeldblum
  • 65,165
  • 12
  • 129
  • 169
  • 2
    Definitely a *lot* of momentum behind this project, but very severe design decisions may make it difficult to use for certain tasks. Unclear how this will play out in the long run, particularly as far as relevance/usability to/for smaller (i.e. non-worldwide-scale) users. – user359996 Oct 04 '10 at 04:19
  • 6
    @user359996 - Could you briefly list some of the design decisions? – spenthil Feb 17 '11 at 19:20
1

As the others said, it depends always on your needs. I for example prefer whatever suits my applications best.

I first used memcached to have fast read/write access. As Java API I´ve used SpyMemcached, what comes with an very easy interface you can use for writing and reading data. Due to memory leaks (no more RAM) I was required to look for another solution, also I was not able scale right, just increase the memory for a single process seemed to be not an good achievement.

After some reviewing I saw couchbase, it comes with replication, clustering, auto-failover, and a community edition (MS Windows, MacOs, Linux). And the best thing for me was, the Java client of it implements also SpyMemcached, so I had almost nothing else to do as setup the server and use couchbase instead of memcached as datastore. Advantage? Sure, my data is now persistent, replicated, and indexed. It comes with a webconsole to write map reduce functions for document views in erlang.

It has Support for Python, Ruby, .Net and more, easy configuration through the webconsole and client-tools. It runs stable. With some tests I was able to write about 10k per second for 200 - 400 byte long records. Reading Performance was way higher though (both tested locally). Have a lot of fun making your decision.

Alex M
  • 136
  • 7
1

Only have experience with mongoDB, memchache and redis. Here's a comparison between them and couchDB.

Seems mongoDB is most popular. It support sharding and replication, eventually consistent, has good support in ruby (mongoid). It also have a richer feature set than the other two. All of mongo, redis and memchache can store the key-value in memory, but redis seems to be much faster, according to this post, redis is 2x write, 3x read faster than mongo. It has better designed data structures and more 'light-weight'.

I would say they have different usages, mongoDB is probably good for large dataset and document storage while memchache and redis are better to store caches or logs.

Community
  • 1
  • 1
Bruce Lin
  • 2,700
  • 6
  • 28
  • 38
1

Just to make the list complete: there's Dreamcache, too. It's compatible with Memcached (in terms of protocol, so you can use any client library written for Memcached), it's just faster.

grokk
  • 11
  • 2