0

Possible Duplicate:
How to index pdf, ppt, xl files in lucene (java based or python or php any of these is fine)?

I need to search a string in a collection of files in a folder includes the pdf, docx, txt formats. Is it possible to search a string using lucene.net.

please give some references helpful for this..

thank u..

Community
  • 1
  • 1
Chidambaram
  • 434
  • 4
  • 14

1 Answers1

5

You would need to extract the text of the various files (pdf, docx, txt) and insert that text into a that to a Lucene index. Lucene doesn't have the ability to read text out of the various document formats

Generally search for "extract {document format} text in .net" and you should find plenty of resources.

Community
  • 1
  • 1
Prescott
  • 7,312
  • 5
  • 49
  • 70