1

I have a Rails application that accepts file uploads of arbitrary business documents such as from Word, Excel, Powerpoint, and PDF. I need to make all these documents searchable, preferably using Sphinx or PostgreSQL full text search. What are the best solutions?

dan
  • 43,914
  • 47
  • 153
  • 254
  • There's a related question here: http://stackoverflow.com/questions/1207995/indexing-word-documents-and-pdfs-with-sphinx – dtt101 Sep 23 '11 at 14:59

1 Answers1

0

As pointed out in the comments, this is covered pretty well by an older question.

In short: you're going to have to store the relevant extracted data from those files in the database for Sphinx, and likely for PostgreSQL full-text search as well. Sphinx can now also understand plain text files (as long as a database column points to a file), but that will still involve another tool extracting data from PDF, DOC, XLS et al.

pat
  • 16,116
  • 5
  • 40
  • 46