0

is there any way how to covert PDF to HTML? I need a text from the file and when I tried PDFtoText library, I got the text, but unsorted and without any rules for parsing. I noticed, that some PDFtoHTML online services works great with the file. So, any tips please? Here is the PDF file and I need only one specific row in the right column.

Droidik
  • 177
  • 4
  • 18

2 Answers2

0

Try integrating the PDFtoHTML from the poppler project; that should support table recognition.

A T
  • 13,008
  • 21
  • 97
  • 158
0

pdftohtml works fine : fast, stable but the html result is ugly at best. I have used it for quite some time for a web site that has many job resumes.

It is a good solution for extracting textual content however.

I would give the scribd API a try

http://www.scribd.com/developers/api

or the google apps document API. GOogle does a great job a displaying and converting pdf files

Mohit Bumb
  • 2,466
  • 5
  • 33
  • 52