9

I'm trying to get a good natural language search going in a website, and trying to understand the advantages of Apache Solr vs Xapian. Xapian seems easier to set up. Do both offer good natural language searches? Any insight appreciated.

javanna
  • 59,145
  • 14
  • 144
  • 125
  • 3
    this could be useful: http://stackoverflow.com/questions/2488793/solr-vs-xapian-which-one-gived-you-the-more-meaningful-results – javanna Dec 29 '11 at 11:55
  • 1
    How do you define "natural language search"? Is it a Apple Siri like interaction, e.g. "Find me ...", "What is ..." and so on...? – Mikos Jan 06 '12 at 21:25

1 Answers1

7

Xapian is more like Lucene, a library that you integrate with your application. If you have a C++ app, then Xapian might be a better match. If you have a Java application, Lucene is almost certainly the best choice.

If you want a search server, then compare Omega (built on Xapian) to Solr (built on Lucene). I have not used Omega or Xapian, but Solr has a few features that I have come to depend on, especially the per-field analysis chains. That is a brilliant idea, and one that I wish I had thought of when I was working on Ultraseek.

It is quite easy to extend the Solr analysis chain with your own Java class. I expect that would be more difficult in C++ with Omega/Xapian.

The two engines use different underlying relevance models. Xapian is a probabilistic engine, Lucene is a vector space engine. I have seen both models tuned to perform well, so that might not be a reason to decide.

The Solr/Lucene community is large and very helpful.

Walter Underwood
  • 1,201
  • 9
  • 11