7

Which is the better NoSQL database for most applications?


Both Cassandra (0.7x) and Membase:

  • A Key Value Database
  • Are FAST
  • Horizontally scalable
  • May be coupled with Hadoop for Mapreduce processing
  • Support Increment and Decrement

Cassandra has selectable per query durability/consistency guarantees

Cassandra has BigTable column support

Membase has asynchronous (immediate return) writes


Beyond the consistency guarantees why would you choose one over the other?

MartysMind
  • 71
  • 1
  • 4
  • There are other products that could be better or worse than the above. Any reason to single out those two? – Phill Jan 10 '11 at 05:22
  • For real time queries with high availability and simple scale-out they seam to be the front runers. They both promise fast low latency writes as well as reads with a simple homogenous single process scale-out. – MartysMind Jan 10 '11 at 05:27
  • Cassandra is geared more towards writes than reads tho... – Phill Jan 10 '11 at 06:10
  • Cassandra has asynchronous (immediate return) writes - just select `ConsistencyLevel.ZERO`. – yfeldblum Jan 10 '11 at 21:39

3 Answers3

8

Cassandra offers rows broken up into columns that can be indexed, efficiently updated independently (instead of having to re-write the whole row/object), and used as materialized views (unlike relational rows, cassandra column names can be determined dynamically at runtime).

Cassandra offers fully multi-master replication across multiple datacenters, configurable per-keyspace. (E.g., I want 3 copies of data set X in north america datacenter and 1 copy in europe. But data set Y I want just 2 copies in north america.)

It's incorrect to say that "Cassandra is geared more towards writes than reads." The difference is that both are very fast with Cassandra, unlike most systems that are only fast at reads.

FWIW, Cassandra used to offer asynchronous writes, but we took it out because when you get to the limit of your capacity your choices are (1) running the server into the ground or (2) dropping requests with no feedback to the client that this is what happened. This isn't worth the very small performance increase.

jbellis
  • 19,347
  • 2
  • 38
  • 47
  • Does it mean, Cassandra is writing synchronously depending on consistency level ? Another question is in whic conditions there can be a inconsistent state of data (old values) in some nodes more than eg 1 minute ? And what do you think about megastore, will there be any reflection to cassandra from it ? – sirmak Jan 11 '11 at 02:05
  • *googles* everything I read is the same, http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis says "Best used: When you write more than you read (logging)." - Granted I've only used RavenDB and MongoDB. – Phill Jan 11 '11 at 03:59
  • Phil, the article you cite is the blind leading the blind. – jbellis Jan 12 '11 at 05:24
  • I actually like Phill's article, but there are a several inconsistencies. Most articles do not go deep enough to understand the real tradeoffs. It is similar to the problems with quite a few benchmarks. As for the misconception concerning write, I believe it boils down to the fact that none of like to show our product in bad light. Instead of stating that Cassandra's reds tend to be a bit slower due to concerns for consistency, writes are characterized as being exceedingly fast. – MartysMind Jan 12 '11 at 05:51
  • In memory databases such as Mongo are faster than Cassandra, but sacrifice durability. Mongo can be made to be more durable, but it would then sacrifice its selling point of speed. From what I can see Membase does not allow for a per operation durability setting. This makes sense since it inherits from memcached. – MartysMind Jan 12 '11 at 05:55
5

Membase has recently merged with CouchDB, and will be updating it's disk/persisting layer from sqllite to CouchDB, giving Membase the ability to do map/reduce and querying/indexing.

One thing no one has mentioned yet is that Membase clusters are miraculously easy to setup, whereas Cassandra takes more system admin work.

Cassandra is also more widely adopted so far, though there are some key use cases for Membase such as Zynga and its social games.

Manto
  • 1,642
  • 1
  • 13
  • 28
  • "One thing no one has mentioned yet is that Membase clusters are miraculously easy to setup" ... especially on mac, just like couch – sbartell Aug 10 '11 at 08:36
1

This is really a simplistic question. Why are you not also comparing riak, Couchdb, Hadoop, and others?

There is no such thing as the NoSQL db which is better for most applications. Tokyo Tyrant is great for some stuff. SQLITE is an excellent db which can be scaled if you know what you are doing.

The whole point of noSql is to deconstruct the monolithic RDBMS and provide stripped down db tools that focus on the aspects of db access which are bottlenecks for YOUR application. Every application is different, and therefore there is no such thing as a best choice.

There is, however, a best strategy. That is to identify the raw performance needs of your application, find where the bottlenecks will be, and choose db tools (maybe noSQL and maybe RDBMS) which support those bottlenecks and help you manage them.

The blogosphere is filled with stories of people who started with the same simplistic question and ended up making the wrong choices. If you want the right answer you need to start by asking the right question, and sometimes you need to wake up and smell the coffee and realize that your application is just hard to manage from a technical perspective. Others have discovered that scaling problems can be solved better by the business people but the precondition is that the technical folks have to be able to explain the system, its bottlenecks and natural constraints, and the opportunities to scale more easily in certain ways if only the business would move in a different direction.

Michael Dillon
  • 31,973
  • 6
  • 70
  • 106