11

We are working with a Cassandra database that will store data in the petabyte range. We are thinking of using either ElasticSearch or Solandra, but we are having a fun time deciding between which to use. I'm wondering if the our database might get too large. I know ElasticSearch is scalable, but to what extent - especially with a Cassandra database.

Solandra on the other hand is made for Cassandra and is highly scalable, but again, to what extent?

Both are scalable, but how scalable using Cassandra?

sbridges
  • 24,960
  • 4
  • 64
  • 71
Henry
  • 926
  • 2
  • 12
  • 27
  • 1
    Have a look at this presentation that kimchy (the ElasticSearch lead developer) made at Berlin Buzzwords 2011: http://berlinbuzzwords.de/sites/berlinbuzzwords.de/files/elasticsearch-bbuzz2011.pdf – DrTech Aug 04 '11 at 19:08

2 Answers2

4

Solandra is being used in the 10s of Terabytes range.

Are you saying you want to index a PB of data in solandra or a subset? I think if you want 1 big index with a PB of data you are stretching the limits. but If you want a PB of indexes, then this will scale the same as Cassandra.

How many nodes are you planning to run? how much disk per node?

tjake
  • 506
  • 2
  • 3
  • This pretty much answers my question. "10s of Terabyte range" is really what I was asking. ALSO: – Henry Jun 18 '11 at 03:02
  • Does Solandra store documents as-they-are (in rows of a column family, for example), with the Lucene index containing only pointer information... or are documents bound into (stored with) the index itself (which, of course, is stored in Cassandra). If it ends up being a dumb/unclear question, my apologies in advance. – Henry Jun 18 '11 at 03:03
1

Have a look into this nice discussion:

http://groups.google.com/a/elasticsearch.com/group/users/browse_thread/thread/3f99e682887f98e4

Karussell
  • 17,085
  • 16
  • 97
  • 197