Apache Tika parser char limit exception

Question

I am using Apache Tike parser v1.24. We have large size PDF files. When parsing these we get the following error:

Exception: Your document contained more than 100000 characters, and so your requested limit has been reached. To receive the full text of the document, increase your limit. (Text up to the limit is however available).]

I tried to setting the parameter of bodyContentHandler to -1. But it didn't work.

Thanks in advance

Try this - https://stackoverflow.com/questions/31079433/how-to-read-large-files-using-tika — Ajay Srivastava, Feb 11 '21 at 12:10
How are you calling Apache Tika? With no code it is hard to tell what you did wrong... — Gagravarr, Feb 11 '21 at 13:03

score 1 · Answer 1 · answered Feb 13 '21 at 13:41

1

Please use the pdfbox to split pdf file per page - look at class Splitter

answered Feb 13 '21 at 13:41

marek.kapowicki

674
2
5
17

Apache Tika parser char limit exception

1 Answers1