Is there any java library for converting document from pdf to html?

Question

Open source implementation will be preferred.

I would like to know a solution for this too. PDFBox is able to do so (http://java.dzone.com/articles/converting-pdf-html-using?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+javalobby%2Ffrontpage+%28Javalobby+%2F+Java+Zone%29), but in a very limited way. — Alp, May 02 '11 at 11:15

score 2 · Accepted Answer · edited May 23 '17 at 10:32

Obviously, it isn't an easy task, PDF formatting is much richer than HTML's one (plus you must extract images and link them, etc.).
Simple text extraction is much simpler (although not trivial...).
I see in the sidebar of your question a similar question: Converting PDF to HTML with Python which points to a library (poppler, which is apparently written in C++, perhaps can be accessed with JNI/JNA) and to a related question which offers even more answers.

score 1 · Answer 2 · answered Nov 04 '14 at 23:03

1

Try using PDFBox from the apache foundation.

answered Nov 04 '14 at 23:03

dacracot

22,002
26
104
152

score 1 · Answer 3 · answered Dec 11 '08 at 11:08

1

Only ones I know of have to be paid for.

BFO
JPedal

answered Dec 11 '08 at 11:08

Kablam

2,494
5
26
47

Is there any java library for converting document from pdf to html?

3 Answers3

Linked