PDF to Text using PHP - Windows Server

Question

Possible Duplicate:
How to extract text from the PDF document?

Problem / Application: I am building a system in PHP/Java on a Windows 2008 Server running Apache. The concept is that a user will upload a PDF file. I then want the system to analyze the uploaded PFD file and generate a Title/Description using a algorithm I am going to design. Later my search engine will be able to search through the stored titles/descriptions to find PDF's relavent to the search. This will allow me to search stored PDF files without accessing the PDF's during the search.

What I need is a script or code that converts the PDF to text and store it to an array or something that I can then break down to get what I need.

I've found other threads that use unix/linux command line techniques. However I haven't found any scripts that will allow me to do what I need for Apache servers on Windows.

Any suggestions or alternative techniques I could use for this would be greatly appreciated!

http://stackoverflow.com/questions/6999889/how-to-extract-text-from-the-pdf-document — Samuel Cook, Nov 16 '12 at 17:54
This class works pretty well (best one I've found): https://github.com/christian-vigh-phpclasses/PdfToText — dlofrodloh, Dec 07 '16 at 15:40

score 0 · Answer 1 · answered Nov 16 '12 at 18:03

Conversion of PDF files to plain text is problematic due to the way text is represented within them (as drawing instructions on a two dimensional surface), especially when the source is multi-columnar.

There are a number of both open source and proprietary tools you can use but having looked at all of them, I can confidently state none work for all cases. A Google search for "PDF to text conversion" will show you most of them.

You may also wish to explore use of a text search engine with PDF conversion built-in, like SOLR or elastic-search, both are open source and based on Apache Lucene. Again, a Google search for either will point you their respective homepages.

PDF to Text using PHP - Windows Server

1 Answers1