69

How to extract text from the PDF document using PHP?

(I can't use other tools, I don't have root access)

I've found some functions working for plain text, but they don't handle well Unicode characters:

http://www.hashbangcode.com/blog/zend-lucene-and-pdf-documents-part-2-pdf-data-extraction-437.html

Sfisioza
  • 3,830
  • 6
  • 42
  • 57
  • link doesn't work! please rectify! – cwiggo Nov 25 '12 at 20:12
  • 27
    Don't see why this question is considered off-topic as it is very useful, even if it may attract 'opinionated' answers, it is always better to see different points of views. Has a lot of hits too. – user3574492 Jun 04 '15 at 23:54

1 Answers1

58

Download the class.pdf2text.php @ https://pastebin.com/dvwySU1a or https://webcheatsheet.com/php/scripts/pdf2text.zip

Code:

include('class.pdf2text.php');
$a = new PDF2Text();
$a->setFilename('filename.pdf'); 
$a->decodePDF();
echo $a->output(); 

  • class.pdf2text.php Project Home
  • pdf2textclass doesn't work with all the PDF's I've tested, If it doesn't work for you, try PDF Parser
Pedro Lobito
  • 94,083
  • 31
  • 258
  • 268