26

We are planning to store millions of documents in MongoDB and full text search is very much required. I read Elasticsearch and Solr are the best available solutions for full text search.

  • Is Elastic search is mature enough to be used for Mongodb full text search? We also be sharding the collections. Does Elasticsearch works with Sharded collections?

  • What are the advantages and disadvantages of using Elasticsearch or Solr?

  • Is MongoDB capable of doing full text search?

Jeroen
  • 60,696
  • 40
  • 206
  • 339
atandon
  • 557
  • 3
  • 10
  • 19

7 Answers7

25

There are some search capabilities in MongoDB but it is not as feature-rich as search engines.

http://www.mongodb.org/display/DOCS/Full+Text+Search+in+Mongo

We use Mongo with Solr to make content searchable. We prefer Solr because

  • It is easy to configure and customize
  • It has large community (This is really helpful if you are working with opensource tools)

Since we didn't work with ES i could not say much about it. You can found some discussions about Solr vs ES on the links below.

Community
  • 1
  • 1
Parvin Gasimzade
  • 25,180
  • 8
  • 56
  • 83
  • 1
    Thanks Parvin. your post was helpful. – atandon Jun 14 '12 at 07:22
  • 2
    "not as efficient as search engines" ... I would say "not as **feature-rich** as search engines." Mongo multi key search efficiency is not bad, but it lacks features present in search engines. – Zaid Masud Aug 25 '12 at 11:32
  • Thank you for warning. Updated it as feature-rich. – Parvin Gasimzade Dec 26 '12 at 08:04
  • Currently there are experimental full text search in latest dev mongo http://blog.mongodb.org/post/40513621310/mongodb-text-search-experimental-feature-in-mongodb –  Feb 07 '13 at 10:20
22

I have a professional experience with both Solr/MySQL and ElasticSearch/MongoDB.

If you are going to query a lot your search engine, you already shard your MongoDB (I mean, if you want to shard too your search engine): you should use ElasticSearch, unless what you want to do can't be done with ElasticSearch. And you should use it even if you are not going to shard.

ElasticSearch is a new project on top of Lucene that brings the sharding mechanism, from someone who is used to distributed environments and search (Shay Bannon made Compass and worked for Gigaspaces, the datagrid editor).

ElasticSearch is as easy as MongoDB to shard, I think it is even simpler and the default works great for most cases.


I don't like Solr so much.

  • The query langage is not structured at all (but it's the case of plugins and Lucene, and I think you can use this unstructured query langage with ES too)
  • I don't think there is a proper Solr client. Solr java client sucks, and I hearh PHP guys also complaining, while ElasticSearch Java client is very nice, much more typesafe and offers async support (nice if you use Netty for exemple). With Solr, you will do a LOT of string concatenation.
  • Less easy to scale
  • Not so new project, I felt the technical dept it has. ElasticSearch is born from Compass, so I guess all the technical dept has been dropped to have a fresh new approach.

Concerning data importing, I have experience with both Solr DataImportHandler and ElasticSearch rivers (CouchDB and MongoDB). What I can tell you is:

  • Solr permits to do more things, but in a very unstructured XML way, and the documentation doesn't help you so much to understand what is really happing once you are out of the hello world and try to use some advanced features.
  • ElasticSearch approach is more simple and also limited but has out of the box support for some technologies while DataImportHandler seems more complex-SQL friendly
  • With my Solr project I had to use manual indexation for some documents, but it was mostly because of the impossibility to denormalize the needed data into a document (the Solr project uses MySQL).

There is also a new MongoDB connector for both Solr and ElasticSearch which I need to test asap :) http://blog.mongodb.org/post/29127828146/introducing-mongo-connector


So in the end, I'll definitly choose ElasticSearch, because:

  • It now has a great community
  • Many people I know with experience with Solr like ElasticSearch
  • The client side is safer and structured, and provides async with Java Futures
  • Both can probably import data from MongoDB easily with the new connector
  • As far as I know, it permits to do almost everything Solr does (in my experience but I'm not a search engine expert)
  • It adds sharding out of the box
  • It adds percolation which can help to built realtime scalable applications (but you'll probably need an additional messaging technology)
  • The source code I read has nearly no technical dept compared to Solr (at least on the client side), and it seems easy to create plugins.
Sebastien Lorber
  • 89,644
  • 67
  • 288
  • 419
7

In terms of MongoDB natively, no it doesn't have full text search support. You can see that it is a popular feature request:

https://jira.mongodb.org/browse/SERVER-380

From what I know of the ES river plugin for MongoDB, it tails the oplog for it's functionality. Since a sharded setup would have multiple oplogs and there would be no way to easily alter that code to connect via a mongos.

Similarly for Solr, the examples I have seen usually involve similar behavior to the ES plugin. Some more solid info here:

http://blog.knuthaugen.no/2010/04/cooking-with-mongodb-and-solr.html

I have not got any experience using one but others have made comparisons before, take a look here:

Solr vs. ElasticSearch

ElasticSearch, Sphinx, Lucene, Solr, Xapian. Which fits for which usage?

Community
  • 1
  • 1
Adam Comerford
  • 21,336
  • 4
  • 65
  • 85
6

MongoDB can't do efficient full text search. You can do wildcard searches on fields, but i don't think these use indexes efficiently.

I would recommend using the river functionality of ElasticSearch to automatically push the documents from MongoDB to ElasticSearch.

elasticsearch-river-mongodb is a MongoDB to Elasticsearch river that when a document changes in MongoDB, ElasticSearch will monitoring the oplog and then automatically update its index.

This minimises the problem of keeping the two datastores in sync, as ElasticSearch is just monitoring the replication tables of Mongo.

Kelvin
  • 5,227
  • 1
  • 23
  • 36
AddersUK
  • 169
  • 2
  • 2
  • 1
    A number of devs have commented that [regexp searches have been quite efficient](https://groups.google.com/forum/#!topic/mongodb-user/rG_Ffzb7TQA), even for 100k documents. – Dan Dascalescu Nov 05 '12 at 04:07
2

Mongo is not at al good for fulltext search. Obviously you need to index you fields for fast searching, and indexing fields containing BIG data (long long strings) will be failed in mongo. it has a limit of 1k for index, if you have content more thn 1k, it will be ignored by index and will not be displayed in your search results. obviously if you are trying to perform a full text search for your articles, mongo is not at al a good choice.

Mahendra
  • 171
  • 1
  • 3
2

Currently, in MongoDB 2.4.6, there now IS a full-text search in MongoDB and it is more feature rich, then in previous versions. On http://docs.mongodb.org/manual/core/text-search/ are described the capabilities of the new functionality.

Worth mentioning:

  • tokenizes and stems the search term(s) during both the index creation and the text command execution. assigns a score to each document that
  • contains the search term in the indexed fields. The score determines the relevance of a document to a given search query.

However, in this answer (from September 2013) https://stackoverflow.com/a/18631775/1920149 you can see, that mongo still warns from using this functionality in production. This functionality is still in beta stage.

Community
  • 1
  • 1
Ev0oD
  • 1,395
  • 16
  • 33
2

Full text search become possible in product environment with Mongodb since the version 2.6 by creating text index on the required fields. indexe text in mongodb

Salim Hamidi
  • 20,731
  • 1
  • 26
  • 31