16

My team has asked me to choose between Cassandra and SOLR for faster response @ frond end queries. I told them that Cassandra is NOSQL db thing while SOLR is indexing thing. But then they say that we can push our complete db to SOLR (like using SOLR as db) or we can just use Cassandra with SOLR. All confused.

Amount of data we are dealing is like 1 Billion spread over 4 MySQL table(fetched using joins) and we get only read queries from the website. We dont need FULL TEXT SEARCH

I think something in which SOLR cannot be beated easily is is its full text search feature but then we dont need it on our case.

So what else SOLR has which Cassandra cannot provide and what does Cassandra has that it can replace SOLR in our particular case?

In other words, who is going to perform better? Cassandra alone? SOLR as a db alone? Or both together? And most importantly why and why not?

Its really important for me to backup my choice with strong point as if why one is better than other during my next team meeting.

And thanks in advance.

EDIT:

  • SOLANDRA is not an option because it not that mature and no more maintained I guess
  • DataStax is not an option because SOLR feature is provided in only Enterprise Edition
codersofthedark
  • 9,183
  • 8
  • 45
  • 70
  • 1
    @Xodarap how is it a huge problem? You can easily have strong consistency guarantees if you need them. You get to pick (per-operation) how many replicas to wait for a response from: http://www.datastax.com/docs/1.0/dml/data_consistency – Tyler Hobbs Apr 18 '12 at 00:31
  • @Tyler: Facebook switched to hbase [due partially to its simpler consistency model](http://highscalability.com/blog/2010/11/16/facebooks-new-real-time-messaging-system-hbase-to-store-135.html) and I know that increasing consistency for us in Cassandra led to latency issues. I have no doubt that this can be worked around, but it's worth considering. – Xodarap Apr 18 '12 at 14:11

4 Answers4

9

If you don't need Solr's full-text search capabilities, there's very little reason to choose it over Cassandra, in my opinion.

(Disclosure: I work for DataStax.)

Operationally, handling a Cassandra cluster will be much simpler due to the Dynamo-based architecture. Sharding Solr can be quite painful, which is one of the big reasons why we at DataStax built search into DSE; it's something that a lot of people want to avoid. I'm not trying to sell you on DSE, just pointing out the downside to Solr.

For example, when you want to change the number of shards with Solr, you have to create and build an entirely new index. You have to worry about deadlock with a Solr cluster. There are several other limitations: http://wiki.apache.org/solr/DistributedSearch

You haven't said much about what kind of queries you need to be able to support. Adding that info would get you better answers.

Tyler Hobbs
  • 6,872
  • 24
  • 31
  • (Currently the queries are just read queries and fetched using join on 4 MySQL table. Lemme know if something more you want to know about the nature of the queries) So, from your input can I conclude that if our index is on a single system and we dont need full-text search capabilities, then SOLR and Cassandra are going to perform equally likely but if index is distributive then Cassandra would be better to use? Or even in single machine Cassandra is going to perform better? If yes/no then why? – codersofthedark Apr 18 '12 at 02:17
  • @dragosrsupercool the nature of the queries would definitely be helpful; the more details the better. – Tyler Hobbs Apr 18 '12 at 17:54
5
  • Cassandra is a NoSQL data store and it was designed to take care of huge amounts of data. Tera bytes and beyond. Definitely it was designed to perform.
  • Remember that NoSQL DB's or data stores have limited capabilities when it comes to queries. They will not have JOIN queries. As this will kill a system. Think about it!
  • You would definitely be able to read/write pretty fast and some of the data can be queried.
  • Flexible schema, you can push sparse data into it. That is, where in general DB's you push NULL for an empty entry, here you dont push it at all :) You don't need to!
  • No full text searching.

This is where the big BUT comes in.

  • Having said the above, SOLR on the other end is TF-IDF full text search engine. Though you can use it for your DB.
  • Flexible Schema. Just mark fields that are not required.
  • Solr will help in tokenizing, parsing and indexing the data pretty quickly. It will have a superb response. It returns XML and you can parse the XML to create data that is representable.
  • Read queries are fast and I mean really fast. But I have no comparison between Cassandra and SOLR to share.

And in the end, since you want CASSANDRA and SOLR together. Check out SOLANDRA (former Lucandra)

  • 2
    Solr [allows](http://wiki.apache.org/solr/SchemaXml#Dynamic_fields) for a flexible schema. – Xodarap Apr 17 '12 at 16:39
  • 1
    @Wajih: Agreed, Cassandra can take care of huge amount of data, but then SOLR can also do that, it scales well I guess. Please correct me if I am wrong. Moreover, JOIN is something neither SOLR nor Cassandra can provide. Flexible schema is provided by both of them. SOLR had this high performaing full text search but then we dont need it our particular situation. Nor we need write operation. So now you say Cassandra reads are pretty fast but then you say SOLR is fast again, so the question still is what makes one better than other in our case? :( – codersofthedark Apr 17 '12 at 17:50
  • I have rewritten the question for better clarity of the situation. :) – codersofthedark Apr 17 '12 at 17:52
  • @Xodarap - Meant to say sparse data. May be I need to elaborate my answer. –  Apr 24 '12 at 12:24
  • @Xodarap - Hmmm... I guess I have missed this important point. I will look into it. Do you have any links? I mean apart from Cassandra official? Last time I worked with SOLR I had this problem. May be I was doing something wrong... –  Apr 24 '12 at 16:34
  • @Wajih: Just don't mark your fields as required. See e.g. [SOLR-181](https://issues.apache.org/jira/browse/SOLR-181). – Xodarap Apr 24 '12 at 17:14
4

You can also take a look at Datastax
There's Community and Enterprise edition, though I think Solr isn't included in community edition :(

Solandra is not being actively developed any more, the author moved to Datastax and continued his work there.

IMHO what Cloudera is for Hadoop, that's Datastax for Cassandra.

Marko Bonaci
  • 5,622
  • 2
  • 34
  • 55
  • 1
    oh yes...forgot about DataStax.Should have mentioned it :) –  Apr 17 '12 at 12:21
  • @mbonaci:its really important as if to understand why do we need to use both of them and trus Datastax / Solandra? I mean what is there in Cassandra which SOLR cannot provide and vice versa? We dont need full text search. So cant one replace the other? – codersofthedark Apr 17 '12 at 17:45
  • I have rewritten the question for better clarity of the situation. :) – codersofthedark Apr 17 '12 at 17:51
  • Sorry, if you don't need FT search you don't need Solr at all (as Tyler also said). Solr is search engine first and all-other-things second. Then Datastax Cassandra community edition (no Solr) should be enough to get started building proof-of-technology. – Marko Bonaci Apr 19 '12 at 07:24
2

Solrs indexing features would out perform Cassandra for reads. It'll index popular queries so frequent ones will be faster still. It was built for reads, cassandra is built to store. But as already stated Cassandra will scale awesomly if that's needed. Why not benchmark single node, 1 mill random text strings, 1mill query average. Either of em will out perform mysql let alone mysql join queries. PS solr will soon support joins I think solr 4.....