I run a job board (PostJobFree.com) with about 1M resumes in it. Average resume size is about 4000 bytes. PostJobFree uses ASP.NET/C#/SQL Server 2008 R2 and MS SQL Full Text search.
From reading various articles I believe that switching to Apache Lucene search would make searches faster and more scalable. But I never tried it yet.
Question 1: Is it the right choice to switch away from MS SQL Full Text Search to Apache Lucene at about 1M documents mark or I would not notice significant search speed increase yet? I anticipate about 10%/month growth in the number of searchable documents in my database.
Question 2: What is the best Lucene platform: Solr or ElasticSearch?
Here’s what I found so far:
1. Google Search Trends http://www.google.com/trends/explore#q=elasticsearch%20search%2C%20solr%20search%2C%20sphinx%20search%2C%20%22sql%20server%22%20%22full-text%20search%22&cmpt=q As of now (April 2013)
- Solr seems to be the most popular search platform at the moment, but Solr popularity did not grow at all in the last year (April 2012 – April 2013).
- ElasticSearch is growing rapidly starting from the end of 2010, but still is only about 40% as popular Solr.
- Sphinx search grew in popularity 2006-2009 and is on decline starting from the year of 2009. Now it's about as popular as ElasticSearch.
- SQL Server Full-Text search is on the long-term decline.
Do these trends correlate with the quality of these search platforms?
2. Past StackOverflow questions
StackOverflow had search platform comparison question in February 2010: ElasticSearch, Sphinx, Lucene, Solr, Xapian. Which fits for which usage?
“ElasticSearch” answer by its creator Shay Banon was the favorite back then.
Similar question was asked later in 2012: Solr vs. ElasticSearch
The most popular answer was again in favor of ElasticSearch.
3. Other
Nick Zadrosny (who runs both Solr hosting service websolr.com and ElasticSearch hosting service bonsai.io) was a proponent of ElasticSearch in April 2012: https://news.ycombinator.com/item?id=3833735
Here's Nick's answer today (April 2013):
Elasticsearch does tend to be a bit more beginner friendly compared to Solr. Elasticsearch has a nicer API and is definitely easier to set up and configure for a new application. That said, Solr still has some advantage in terms of maturity and robustness, and the learning curve isn't too unreasonable when you don't need to worry about production configuration. Beyond that, both share the same roots in Lucene, and offer similar functionality. Either should be equally appropriate for your needs.
I wonder what exactly does "robustness" mean and how that difference in robustness would change in the coming years?
4. Percolation
ElasticSearch has Percolation feature that should allow me to implement resume search alerts with immediate delivery. Does Solr have anything like that?