We have two sites, one developed in RoR and other in Python(Django). MongoDB is used as a data store for the sites. The sites are login-based. So, users are able to see only their data and not of other user's. Also, there are many models in MongoDB and these models are inter-related among themselves.
We have to develop a search feature similar to Gmail search. In Gmail search box, there are many fields like label:
, to:
, from:
, attachment:
etc for filtering purposes. If none of these fields is selected, then a normal search is done. What is astonishing is that Gmail search results are fetched in less than 1 second on 256 kbps bandwidth speed for any search query.
It is not feasible to search for a keyword by calling multiple queries in all models. For solutions on crawling DB data, google search was looked up.
While doing a google search on "search engine", there was a result mentioning about crawling and indexing web pages. The tools available are Lucene Solr+Nutch and Sphinx. But it is meant for crawling web pages and storing the keywords into db using Nutch and indexing keywords and searching them using Solr.
Googling on "database search engine" doesn't provide any concrete results.
In this link, in second point, it was stated that MongoDB etc. seem to serve a purpose where there is no requirement of searching and/or faceting. So, does it mean that crawling and indexing on MongoDB is not feasible?
In general sense, is there anything like crawling and indexing on databases, irrespective of database tools (MySQL, SQLite, PostgreSQL, MongoDB etc)?
Update:
The sites we have developed is very similar to Gmail, except that it is not about mail service. We just need to develop a search feature. So, Gmail users are able to see their mails, not other's mails. Similarly, contents on our sites are specific to users. Hope it clarifies the problem.