2

I have 1000+ PDF searchables.

I need some plugin or aplication to index it, such as (http) joomla.natemaxfield.com

Nick
  • 21
  • 1
  • 3

2 Answers2

2

We use Swish-e to index our website which includes thousands of PDF's, Word files and even WordPerfect files. It works great. It is free, open source and integrates well with PHP.

http://swish-e.org/index.html

From their homepage:

Swish-e is a fast, flexible, and free open source system for indexing collections of Web pages or other files. Swish-e is ideally suited for collections of a million documents or smaller. Using the GNOME™ libxml2 parser and a collection of filters, Swish-e can index plain text, e-mail, PDF, HTML, XML, Microsoft® Word/PowerPoint/Excel and just about any file that can be converted to XML or HTML text. Swish-e is also often used to supplement databases like the MySQL® DBMS for very fast full-text searching.

Steve Massing
  • 1,843
  • 13
  • 13
1

Take a look at PDFMiner. It can do what you want quite easily. Also, please search for similar questions as this is a possible dupe of: Python module for converting PDF to text

Community
  • 1
  • 1
Mahmoud Abdelkader
  • 23,011
  • 5
  • 41
  • 54