0

I'm suddenly getting a few hundred thousand log messages like this. "Operator cm has too few operands". I understand that this is usually from a corrupt PDF file. I'm interested in limiting the number of errors logged

Will PDFBox produce this many errors from processing one file? Or should it abort after the first one? It looks like this is from processing a single file.

If this is all from one file, is there a handy way to limit logging from one file? Or cause an abort when it sees this?

I have read Disabling logging on PDFBox, and I could turn off all logging for org.apache.pdfbox.contentstream.PDFStreamEngine, but that might shut out useful messages.

This message is logged at level ERROR. If PDFBox continues processing, should PDFBox log this message as WARN?

If this is an error in the file, but it does not fail processing, is there a way to detect that this error occurred during processing, so I can flag the file for review? I'm just calling Tika.parseToString

Thanks!

curiousity
  • 79
  • 7
  • 1
    If we flagged errors on PDF, then there would be lots of it, and support tickets "but it renders on Adobe Reader!!1! How dare you!". There are so many bad PDFs out there. So we just put out a log message and keep working. – Tilman Hausherr Jun 08 '23 at 03:11
  • I have no problem with that. My question is "will it potentially output tens of thousands of messages from one bad pdf?" – curiousity Jun 08 '23 at 03:19
  • 1
    In the worst case, yes. There are very few of these in the wild, but it happens. – Tilman Hausherr Jun 08 '23 at 04:12

0 Answers0