I have a Rails application that accepts file uploads of arbitrary business documents such as from Word, Excel, Powerpoint, and PDF. I need to make all these documents searchable, preferably using Sphinx or PostgreSQL full text search. What are the best solutions?
Asked
Active
Viewed 520 times
1
-
There's a related question here: http://stackoverflow.com/questions/1207995/indexing-word-documents-and-pdfs-with-sphinx – dtt101 Sep 23 '11 at 14:59
1 Answers
0
As pointed out in the comments, this is covered pretty well by an older question.
In short: you're going to have to store the relevant extracted data from those files in the database for Sphinx, and likely for PostgreSQL full-text search as well. Sphinx can now also understand plain text files (as long as a database column points to a file), but that will still involve another tool extracting data from PDF, DOC, XLS et al.

pat
- 16,116
- 5
- 40
- 46