284

With the NoSQL movement growing based on document-based databases, I've looked at MongoDB lately. I have noticed a striking similarity with how to treat items as "Documents", just like Lucene does (and users of Solr).

So, the question: Why would you want to use NoSQL (MongoDB, Cassandra, CouchDB, etc) over Lucene (or Solr) as your "database"?

What I am (and I am sure others are) looking for in an answer is some deep-dive comparisons of them. Let's skip over relational database discussions all together, as they serve a different purpose.

Lucene gives some serious advantages, such as powerful searching and weight systems. Not to mention facets in Solr (which Solr is being integrated into Lucene soon, yay!). You can use Lucene documents to store IDs, and access the documents as such just like MongoDB. Mix it with Solr, and you now get a WebService-based, load balanced solution.

You can even throw in a comparison of out-of-proc cache providers such as Velocity or MemCached when talking about similar data storing and scalability of MongoDB.

The restrictions around MongoDB reminds me of using MemCached, but I can use Microsoft's Velocity and have more grouping and list collection power over MongoDB (I think). Can't get any faster or scalable than caching data in memory. Even Lucene has a memory provider.

MongoDB (and others) do have some advantages, such as the ease of use of their API. New up a document, create an id, and store it. Done. Nice and easy.

Community
  • 1
  • 1
eduncan911
  • 17,165
  • 13
  • 68
  • 104
  • 8
    See http://stackoverflow.com/questions/2546494/is-mongodb-a-valid-alternative-to-relational-db-lucene – bajafresh4life Jul 09 '10 at 18:30
  • 4
    Thank you, but that does not answer my question: which is, why would I use MongoDB instead of Lucene for my database? They both handle documents, but Lucene has some very powerful search options. +1 though for actually finding a related question. I search several times on Stackoverflow, and did not come up with a near comparison. – eduncan911 Jul 09 '10 at 20:26
  • How are you using Lucene that it provides functionality similar to MongoDB? Are you tying it to a relational DB for storage? – Philip Tinney Jul 09 '10 at 20:49
  • 1
    @Philip: It's a hypothetical question. Why not use Lucene as your document storage? You get a lot more searching power and scalability (when mixed with Solr, making Lucene even easier to use). – eduncan911 Jul 10 '10 at 13:46

10 Answers10

260

This is a great question, something I have pondered over quite a bit. I will summarize my lessons learned:

  1. You can easily use Lucene/Solr in lieu of MongoDB for pretty much all situations, but not vice versa. Grant Ingersoll's post sums it up here.

  2. MongoDB etc. seem to serve a purpose where there is no requirement of searching and/or faceting. It appears to be a simpler and arguably easier transition for programmers detoxing from the RDBMS world. Unless one's used to it Lucene & Solr have a steeper learning curve.

  3. There aren't many examples of using Lucene/Solr as a datastore, but Guardian has made some headway and summarize this in an excellent slide-deck, but they too are non-committal on totally jumping on Solr bandwagon and "investigating" combining Solr with CouchDB.

  4. Finally, I will offer our experience, unfortunately cannot reveal much about the business-case. We work on the scale of several TB of data, a near real-time application. After investigating various combinations, decided to stick with Solr. No regrets thus far (6-months & counting) and see no reason to switch to some other.

Summary: if you do not have a search requirement, Mongo offers a simple & powerful approach. However if search is key to your offering, you are likely better off sticking to one tech (Solr/Lucene) and optimizing the heck out of it - fewer moving parts.

My 2 cents, hope that helped.

LeeWallen
  • 88
  • 8
Mikos
  • 8,455
  • 10
  • 41
  • 72
  • 11
    Solr has no map reduce functionality. Therefore reporting, stats, computation of scores etc. are not possible! Use Solr only if you have/ can threat your data as text data – Roland Kofler Dec 04 '11 at 08:23
  • 8
    Solr does not have map-reduce built-in, but you can combine with Hadoop. http://architects.dzone.com/articles/solr-hadoop-big-data-love – Mikos Dec 04 '11 at 23:10
  • 6
    Map-reduce no, but it does have the ability to run a query in parallel across multiple solr servers and aggregate those results. So while it doesn't have general purpose map-reduce it has already written what you would be writing with map-reduce which is parallel search queries. – chubbsondubs Jun 14 '12 at 03:36
  • @Roo: Would it be an option to use Lucene as a main DB and create aggregate indexes with MongoDB somehow? Or doesn't that make sense? And Mikos: great answer and +1 for the real-world experience mention. – Grimace of Despair Jul 16 '13 at 10:47
  • @Mikos since it has been long since you last answered this question .. What are your thoughts now ? – user794783 Aug 26 '15 at 08:56
  • 2
    from solr6 it supports map reduce functionality with parallel expressions – Divyang Shah Dec 16 '16 at 11:23
  • over nine years later, I am moved to say how annoyed I am at the idea that programmers need to "detox from RDBMS". RDBMS is not a drug, nor a poison. It is a tool that is useful in many, many instances, and totally inappropriate in many others. – Ross Presser Oct 22 '19 at 20:46
37

You can't partially update a document in solr. You have to re-post all of the fields in order to update a document.

And performance matters. If you do not commit, your change to solr does not take effect, if you commit every time, performance suffers.

There is no transaction in solr.

As solr has these disadvantages, some times NoSQL is a better choice.

UPDATE: Solr 4+ Started supporting commit and soft-commits. Refer to the latest document https://lucene.apache.org/solr/guide/8_5/

iDroid
  • 1,140
  • 1
  • 13
  • 30
Peter Long
  • 3,964
  • 2
  • 22
  • 18
  • 15
    MongoDB does not have transactions either. – user183037 Oct 28 '11 at 21:01
  • 1
    Solr or Lucene have realtime search, so committing is not an issue. – mihaicc Jun 22 '12 at 08:02
  • 2
    @user183037 in MongoDB any updates within a document is Atomic. And FYI, Lucene doesn't have transactions (in your sense) either – Aravind Yarram Nov 01 '12 at 21:02
  • 48
    This answer has become incorrect. Solr 4+ does support partial updates, and soft commits / near real time do away with most of the issues of "old-style" Solr commits. – Mauricio Scheffer Jan 30 '13 at 22:47
  • 3
    They added support for transactions on MongoDB 4. – Jonas P. Mar 14 '19 at 02:46
  • @MauricioScheffer Even though Solr has "partial updates", under the hood it still has to read the entire document, update the changed fields and write it back. So Lucene actually does not have partial updates, even to this day. – Brain2000 Nov 05 '20 at 19:22
29

We use MongoDB and Solr together and they perform well. You can find my blog post here where i described how we use this technologies together. Here's an excerpt:

[...] However we observe that query performance of Solr decreases when index size increases. We realized that the best solution is to use both Solr and Mongo DB together. Then, we integrate Solr with MongoDB by storing contents into the MongoDB and creating index using Solr for full-text search. We only store the unique id for each document in Solr index and retrieve actual content from MongoDB after searching on Solr. Getting documents from MongoDB is faster than Solr because there is no analyzers, scoring etc. [...]

KajMagnus
  • 11,308
  • 15
  • 79
  • 127
Parvin Gasimzade
  • 25,180
  • 8
  • 56
  • 83
  • 3
    Good blog post. Yes, this is exactly how I've used Lucene in the past with older SQL and MySql datastores (storing IDs in Lucene, and retrieving the complex types from the datastore). Technically though, this question was to explore the differences between the two - not exactly how to use the "best of both worlds." +1 for using it that way, as it's really the only real way to use massive amounts of data. – eduncan911 Dec 26 '12 at 02:59
  • Thanks for your response. I know that the question is about choosing Nosql over Lucene but here I want to show that, instead of choosing one over other, using them in a hybrid manner will give the better result. – Parvin Gasimzade Dec 26 '12 at 07:58
  • 2
    Do you remember (now 1.5 years later) roughly the size of the Solr database when the query performance had decreased so much so you started thinking about adding MongoDB? (Was it 10,000 docs or 10,000,000 docs?) – KajMagnus Jul 02 '13 at 22:01
  • Very helpful. I work in GIS and so being able to combine full-text with spatial search in this way is very intriguing. We already use MongoDB and Postgres, and I have been thinking about Solr for a while. – John Powell Apr 06 '14 at 10:07
  • 3
    @ParvinGasimzade the blog post link is not working. Could you please provide another link or source ? – oblivion Jan 05 '17 at 08:15
  • @ParvinGasimzade , the blog link is not working. Can I find the content posted elsewhere ? – Mahesh Apr 08 '17 at 07:50
  • @oblivion you can get the blog post via the wayback machine at http://web.archive.org/web/20160305180027/http://www.gasimzade.org/2012/11/under-hood-architectural-overview-of.html – Suzana Jun 16 '20 at 14:58
24

Also please note that some people have integrated Solr/Lucene into Mongo by having all indexes be stored in Solr and also monitoring oplog operations and cascading relevant updates into Solr.

With this hybrid approach you can really have the best of both worlds with capabilities such as full text search and fast reads with a reliable datastore that can also have blazing write speed.

It's a bit technical to setup but there are lots of oplog tailers that can integrate into solr. Check out what rangespan did in this article.

http://denormalised.com/home/mongodb-pub-sub-using-the-replication-oplog.html

Prasith Govin
  • 1,267
  • 12
  • 8
  • If I understood you correctly, the reason you use MongoDB (in addition to Solr), is that MongoDB has faster insertion + read speed? Did you also indicate that MongoDB has a more reliable datastore? (Or were you referring to Solr?) — What did you start with initially? Only MongoDB, only Solr, or both Mongo + Solr? – KajMagnus Jul 02 '13 at 21:21
12

From my experience with both, Mongo is great for simple, straight-forward usage. The main Mongo disadvantage we've suffered is the poor performance on unanticipated queries (you cannot created mongo indexes for all the possible filter/sort combinations, you simple can't).

And here where Lucene/Solr prevails big time, especially with the FilterQuery caching, Performance is outstanding.

mjalajel
  • 2,171
  • 21
  • 27
11

Since no one else mentioned it, let me add that MongoDB is schema-less, whereas Solr enforces a schema. So, if the fields of your documents are likely to change, that's one reason to choose MongoDB over Solr.

Aquarelle
  • 8,864
  • 1
  • 17
  • 11
  • 6
    that IMHO is not quite true. Solr does have a schema as defined in `schema.xml`, BUT it does also have 'dynamic fields', ie fields whose types are determined via wild cards, so you can have all fields matching, say, `*_i` indexed as integer fields. when adding documents, you can then have documents conaining fields like `count_i`, `foo_i`, `bar_i` that are all understood as integer fields without appearing in `schema.xml` literally. pretty schema-less, i'd say. see http://www.youtube.com/watch?v=WYVM6Wz-XTw for more. – flow Aug 26 '13 at 11:34
  • I have to come back and bump this up with a +1 because that is true - schema changes in Solr has always been in a PITA to keep in sync with other data stores. – eduncan911 Jun 09 '14 at 19:58
  • 4
    Solr has a feature that support schema or no-schema! – Krunal Sep 21 '15 at 13:47
5

@mauricio-scheffer mentioned Solr 4 - for those interested in that, LucidWorks is describing Solr 4 as "the NoSQL Search Server" and there's a video at http://www.lucidworks.com/webinar-solr-4-the-nosql-search-server/ where they go into detail on the NoSQL(ish) features. (The -ish is for their version of schemaless actually being a dynamic schema.)

Beth
  • 101
  • 2
  • 4
1

If you just want to store data using key-value format, Lucene is not recommended because its inverted index will waste too much disk spaces. And with the data saving in disk, its performance is much slower than NoSQL databases such as redis because redis save data in RAM. The most advantage for Lucene is it supports much of queries, so fuzzy queries can be supported.

张洪岩
  • 86
  • 4
1

MongoDB Atlas will have a lucene-based search engine soon. The big announcement was made at this week's MongoDB World 2019 conference. This is a great way to encourage more usage of their high revenue MongoDB Atlas product.

I was hoping to see it rolled into the MongoDB Enterprise version 4.2 but there's been no news of bringing it to their on-prem product line.

More info here: https://www.mongodb.com/atlas/full-text-search

Gary Russo
  • 377
  • 1
  • 4
  • 14
0

The third party solutions, like a mongo op-log tail are attractive. Some thoughts or questions remain about whether the solutions could be tightly integrated, assuming a development/architecture perspective. I don't expect to see a tightly integrated solution for these features for a few reasons (somewhat speculative and subject to clarification and not up to date with development efforts):

  • mongo is c++, lucene/solr are java
  • lucene supports various doc formats
    • mongo is focused on JSON (BSON)
  • lucene uses immutable documents
    • single field updates are an issue, if they are available
  • lucene indexes are immutable with complex merge ops
  • mongo queries are javascript
  • mongo has no text analyzers / tokenizers (AFAIK)
  • mongo doc sizes are limited, that might go against the grain for lucene
  • mongo aggregation ops may have no place in lucene
    • lucene has options to store fields across docs, but that's not the same thing
    • solr somehow provides aggregation/stats and SQL/graph queries
Darren Weber
  • 1,537
  • 19
  • 20