I want to convert pdf files into xml. Is there any java library available that can be used for this?
Asked
Active
Viewed 1,056 times
1
-
Pdf2dom can do that, but not very well. – Tilman Hausherr Jan 03 '19 at 15:28
-
What do you expect the output to be? That being asked, library recommendations nowadays are off topic on stack overflow. – mkl Jan 03 '19 at 20:31
1 Answers
1
You can fetch xml representation of any PDF document as below using Apache Tika library
InputStream stream = new FileInputStream("sample.pdf");
ContentHandler handler = new ToXMLContentHandler();
Metadata metadata = new Metadata();
AutoDetectParser parser = new AutoDetectParser();
System.out.println(parser.parse(stream, handler, metadata));

Anil Agrawal
- 2,748
- 1
- 24
- 31