1

I was wondering if it was possible to extract a table of data from a PDF file, into an array or similar so i can import the table data using PHP? I have DomPDF installed to create PDF files, but this does not have options for reading PDF. If i read the PDF file in PHP i get an encoded string:

%PDF-1.5 5 0 obj <>>> endobj 6 0 obj <>stream x��ێ+��W�\`��E���u

Any help would be appreciated.

Adam

adam Kearsley
  • 951
  • 2
  • 13
  • 23
  • What do you want to achieve with that? Why PDF? – Hulk Nov 04 '13 at 13:16
  • I am receiving a PDF via email that contains a HTML table of data which needs entering into our database. I can receive the email and save the PDF, i just cannot read the PDF. The 3rd party sending the file cannot send in any other format. – adam Kearsley Nov 04 '13 at 14:02
  • I'm afraid you are pretty much out of luck, then. There have been several questions looking for PDF parsers, e.g. [this one](http://stackoverflow.com/q/1251956/2513200), perhaps one of the answers there can help you. – Hulk Nov 04 '13 at 14:38

1 Answers1

0

This post is pretty old but seems to have a decent amount of views.

I'm working on a similar project and have had some success with this https://github.com/mgufrone/pdf-to-html . The HTML returns is just a bunch of absolutely positioned p tags, but if the format of your pdfs are consistent you might have some luck working something out to either parse the table or at least get the data you need.

Just make sure that you have the poppler utilities installed.

AndoGrando
  • 43
  • 4