1

I want to convert pdf files into xml. Is there any java library available that can be used for this?

1 Answers1

1

You can fetch xml representation of any PDF document as below using Apache Tika library

InputStream stream = new FileInputStream("sample.pdf");
ContentHandler handler = new ToXMLContentHandler();
Metadata metadata = new Metadata();
AutoDetectParser parser = new AutoDetectParser();
System.out.println(parser.parse(stream, handler, metadata));
Anil Agrawal
  • 2,748
  • 1
  • 24
  • 31