1

Possible Duplicate:
Full-text search in NoSQL databases

I am somewhat new to database querying and i was wondering what the best way to do this would be. I have a database of articles and want my users to be able to search them by keywords in the tittle, i.e allowing them to type a string and all the tittles containing this string would be selected by the query.

What would the most efficient way to do this be? And if i want to avoid strings such as "the" or "it" from being selected?

I am using mongoid in case that helps.

Thanks in advance

Community
  • 1
  • 1
EnriqueC
  • 61
  • 1
  • 4
  • The search characteristics you're after (searching by keywords, ignoring stopwords, ..) are related to [Full Text Search](http://www.mongodb.org/display/DOCS/Full+Text+Search+in+Mongo), which as at MongoDB 2.0 is not an inbuilt server feature (though has [been requested](https://jira.mongodb.org/browse/SERVER-380). – Stennie Jul 20 '12 at 13:46
  • You can implement a form of tag-based search, but there are more powerful search engine products such as Solr and ElasticSearch. There are several previous discussions on SO such as [Full text search in NoSQL databases](http://stackoverflow.com/questions/5453872/full-text-search-in-nosql-databases). – Stennie Jul 20 '12 at 13:46

1 Answers1

1

If your title is stored as a string you could use the regular expression search supported by mongodb. For example:

db.articles.find( { title : /acme.*corp/i } );

Mongodb use PCRE for regular expression. To exclude certain words from the search I would recommend an application side check or you can use the $nin operator. For more info have a look here.

golja
  • 1,063
  • 7
  • 11
  • 1
    The syntax is correct, but the example will not make efficient use of indexes. See the notes on [Regular Expressions](http://www.mongodb.org/display/DOCS/Advanced+Queries#AdvancedQueries-RegularExpressions) .. in particular using case-insensitive `/i` and a match appearing anywhere in the string. – Stennie Jul 20 '12 at 13:38
  • yeah this is why i would tokenize the title, but at the end it's an design decision... – golja Jul 21 '12 at 03:13
  • As a design decision, it would be helpful to point out some of the obvious considerations. For example, if keyword searching is a frequent use case for an application then a full index scan for every query would be a poor implementation choice. Imagine using this with an AJAX autocomplete search where every keystroke would trigger a new search; now add add millions of documents to your collection. An approach like tokenizing into [search tags](http://www.mongodb.org/display/DOCS/Multikeys) would be a better design, but still more limited than full text search with stemming and stopwords. – Stennie Jul 21 '12 at 12:33