0

I am using Apache Tika for document content parsing and interested in specific file formats only (for eg. doc, docx, etc). But when I use a Microsoft word file containing a video (as an embedded file), I am seeing below error:
java.lang.NoClassDefFoundError: com/googlecode/mp4parser/DataSource

I suspect this is because tika is trying to parse the embedded .mp4 file contents and it could not find the dependent class. As I am not interested in parsing embedded file contents, can somebody tell me is there a way to disable this in tika?

Thanks in advance.

Newbie
  • 2,979
  • 9
  • 34
  • 41
  • You generally have to *explicitly enable* processing of embedded resources with Tika! How are you calling Tika? What code? Have you enabled recursion? – Gagravarr Jun 15 '18 at 11:53
  • 1
    See [tika-parser-exclude-pdf-attachments](https://stackoverflow.com/questions/50817271/tika-parser-exclude-pdf-attachments/50841701#50841701) – Tim Allison Jun 15 '18 at 12:23

0 Answers0