Document search in Lucene/Solr, Whoosh, Sphinx, Xapian

Question

I am comparing Lucene/Solr, Whoosh, Sphinx and Xapian for searching documents in DOC, DOCX, HTML and PDF. Only Solr is documented to have a document parser (Tika) which directly indexes documents. So it seems a clear winner.

But to level the playing field, I like to consider the alternatives. Do the others have direct document indexing (which I may have missed)? If not are they can it be implemented easily? Or is Solr the overwhelming choice?

duplicate? http://stackoverflow.com/questions/2271600/elasticsearch-sphinx-lucene-solr-xapian-which-fits-for-which-usage — kmote, Feb 07 '12 at 05:46
Not exactly. I wanted to specifically index rich documents at the time of this question. I chose Solr. I moved on to index DBs and rich documents with DB metadata. — Jesvin Jose, Feb 07 '12 at 06:22

score 0 · Answer 1 · answered Apr 12 '13 at 14:24

0

On Sphinx you're able to convert file using a PHP script through the xmlpipe_command option. Since PHP has a Tika-wrapper, writing the script and the setup itself aren't hard.

answered Apr 12 '13 at 14:24

TBonamigo

56
2

Document search in Lucene/Solr, Whoosh, Sphinx, Xapian

1 Answers1