Best text search engine for integrating with custom web app?

Question

We have a web app that allows users to upload documents, create their own documents, and so on. Uploaded files are stored on Amazon S3, created information is stored in a MySQL database. What I'm looking for is some sort of search engine, where I feed it all of our text documents, each with a unique ID, and it builds an index or whatever. Later, I can give it search queries, and it will pull out the best matching documents (via their ID), along with snippets of matching text.

Basically we want to allow our users to search through their repository of uploaded stuffs, along with anything that other users have marked as public. The solution should run on a standard Linux server, and ideally it would be open source, but I'll also consider paid solutions if they aren't outrageously priced.

So far, I've found three potential candidates:

MySQL Full Text Search - some reports I've read are that it's very slow
Apache Lucene - unfortunately written in Java, but I'll use it if I have to. Supposedly fast
Sphinx - doesn't seem to be as popular, ideally whatever solution I find will have lots of community support.

Please let me know if there are any other good choices that I've overlooked, or if you have experience with any of the above.

It looks like there's a lot of fans of Lucene, so I'll probably do more research on that. If anyone can give pros/cons of different engines that would be great. — davr, Sep 22 '08 at 23:02
There is a very similar question asked here [http://stackoverflow.com/questions/34314/how-do-i-implement-search-functionality-in-a-website](http://stackoverflow.com/questions/34314/how-do-i-implement-search-functionality-in-a-website) — Marcel, Sep 22 '08 at 22:29

score 5 · Accepted Answer · answered Sep 29 '08 at 16:12

5

Take a look at Solr. It's based on Lucene, so it's very fast, and it's really easy to use from any platform.

answered Sep 29 '08 at 16:12

Mauricio Scheffer

98,863
23
192
275

I originally started messing with Lucene, but then found Solr, which is a lot easier to use out of the box. I've got something built based on it with only a few hours work, it's pretty nice. – davr Sep 30 '08 at 01:03

score 2 · Answer 2 · answered Sep 29 '08 at 16:15

2

Sphinx may be worth your consideration, as it works well with several common RDMS (notably MySQL)

answered Sep 29 '08 at 16:15

Marc Gear

2,757
1
20
19

This one also looks pretty good, but I already started down the road with Lucene/Solr. I'll definitely look more into Sphinx if I run into trouble with Lucene/Solr. – davr Sep 30 '08 at 01:04

score 1 · Answer 3 · answered Sep 29 '08 at 15:34

1

There is also Xapian which is fast and is quite customizable.

It has support for custom indexers allowing one to index data that is not stored in a database which might be useful for your documents stored on S3.

answered Sep 29 '08 at 15:34

sock

1,054
2
8
17

score 0 · Answer 4 · answered Sep 22 '08 at 22:26

0

I imagine that Google will have a solution that meets your needs. Start here: Google Enterprise

answered Sep 22 '08 at 22:26

teratorn

1,489
1
11
12

This isn't quite what I'm looking for. I'm looking for something a little more low-level that I can tightly integrate with our application. It looks like there's lots of votes for Lucene, so that's probably what I'll end up with unless I find something better. – davr Sep 22 '08 at 23:08

score 0 · Answer 5 · answered Sep 22 '08 at 22:42

0

There is a Ruby port of Lucene called "Ferret". In addition to the Ruby API, you can get at the underlying c implementation called "cFerret".

answered Sep 22 '08 at 22:42

AShelly

34,686
15
91
152

And there's no Java involved! – AShelly Sep 22 '08 at 22:53

score 0 · Answer 6 · answered Sep 22 '08 at 22:46

0

Lucene is very good. And although it was originally written in java there is a php implementation http://framework.zend.com/manual/en/zend.search.lucene.html

answered Sep 22 '08 at 22:46

Ryan White

1,927
2
19
32

Best text search engine for integrating with custom web app?

6 Answers6

Linked