I am using Apache Tika for document content parsing and interested in specific file formats only (for eg. doc, docx, etc). But when I use a Microsoft word file containing a video (as an embedded file), I am seeing below error:
java.lang.NoClassDefFoundError: com/googlecode/mp4parser/DataSource
I suspect this is because tika is trying to parse the embedded .mp4 file contents and it could not find the dependent class. As I am not interested in parsing embedded file contents, can somebody tell me is there a way to disable this in tika?
Thanks in advance.