1

I am using Apache Tike parser v1.24. We have large size PDF files. When parsing these we get the following error:

Exception: Your document contained more than 100000 characters, and so your requested limit has been reached. To receive the full text of the document, increase your limit. (Text up to the limit is however available).]

I tried to setting the parameter of bodyContentHandler to -1. But it didn't work.

Thanks in advance

Rawfodog
  • 13
  • 4

1 Answers1

1

Please use the pdfbox to split pdf file per page - look at class Splitter

marek.kapowicki
  • 674
  • 2
  • 5
  • 17