is there any way how to covert PDF to HTML? I need a text from the file and when I tried PDFtoText library, I got the text, but unsorted and without any rules for parsing. I noticed, that some PDFtoHTML online services works great with the file. So, any tips please? Here is the PDF file and I need only one specific row in the right column.
Asked
Active
Viewed 500 times
2 Answers
0
Try integrating the PDFtoHTML from the poppler project; that should support table recognition.

A T
- 13,008
- 21
- 97
- 158
0
pdftohtml works fine : fast, stable but the html result is ugly at best. I have used it for quite some time for a web site that has many job resumes.
It is a good solution for extracting textual content however.
I would give the scribd API a try
http://www.scribd.com/developers/api
or the google apps document API. GOogle does a great job a displaying and converting pdf files

Mohit Bumb
- 2,466
- 5
- 33
- 52