1

I recently updated my existing tika project to use tika 1.13 instead of 1.10. The only thing I did was changing the dependency version from 1.10 to 1.13. The project was built successfully. Yet whenever I try and run the application I get this exception:

java.lang.RuntimeException: Unable to parse the default media type registry
    at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:580)
    at org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:69)
    at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:218)
    at org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:341)
    at org.apache.tika.parser.AutoDetectParser.<init>(AutoDetectParser.java:51)
    at com.app.tikamanager.MetaParser.<init>(MetaParser.java:54)
    at com.app.services.MyService.HandleItemInThread(IntelligentDocumentsService.java:260)
    at com.app.intelligentservicebase.ItemHandlerThread.run(ItemHandlerThread.java:41)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.tika.mime.MimeTypeException: Invalid type configuration
    at org.apache.tika.mime.MimeTypesReader.read(MimeTypesReader.java:126)
    at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:64)
    at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:93)
    at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:170)
    at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:577)
    ... 10 more
Caused by: org.xml.sax.SAXNotRecognizedException: http://javax.xml.XMLConstants/feature/secure-processing
    at org.apache.xerces.parsers.AbstractSAXParser.setFeature(Unknown Source)
    at org.apache.xerces.jaxp.SAXParserImpl.setFeatures(Unknown Source)
    at org.apache.xerces.jaxp.SAXParserImpl.<init>(Unknown Source)
    at org.apache.xerces.jaxp.SAXParserFactoryImpl.newSAXParserImpl(Unknown Source)
    at org.apache.xerces.jaxp.SAXParserFactoryImpl.setFeature(Unknown Source)
    at org.apache.tika.mime.MimeTypesReader.read(MimeTypesReader.java:119)
    ... 14 more

The exception is thrown from the constructor of my MetaParser class, the only thing there is the initialization of the AutoDetectParser:

private final AutoDetectParser _tikaExtractor;
public MetaParser()
    {
        _tikaExtractor = new AutoDetectParser();
    }

I am running the application on Ubuntu 14.04 with Oracle JDK 1.8.0_91-b14.

I looked online and this exception was mentioned a couple of times, once a probable fix was to install OpenJDK but that was for an old version of Tika and since the old version used to work fine with the same JDK I don't think that is the problem.

Is there something I need to do or initialize before calling the AutoDetectParser constructor?

Zaid Amir
  • 4,727
  • 6
  • 52
  • 101
  • 1
    Do you have an old copy of Xerces on your classpath by any chance? And if so, what happens when you remove it (to fall back to the JVM-default XML processor), or upgrade it to a recent copy? – Gagravarr Jun 21 '16 at 10:34
  • 1
    Near-duplicate: [Error unmarshalling XML in Java 8 “secure-processing org.xml.sax.SAXNotRecognizedException”](http://stackoverflow.com/questions/25644023/error-unmarshalling-xml-in-java-8-secure-processing-org-xml-sax-saxnotrecognize) – Gagravarr Jun 21 '16 at 10:36
  • @Gagravarr thanks, that worked. Xerces was a dependency of a dependency, I updated it to the latest version and now it works. Thanks – Zaid Amir Jun 21 '16 at 11:14

1 Answers1

7

Promoting comments to an answer - you have a very old version of Xerces on your classpath. Your JVM is picking that as the default XML Parser, so when Tika says "Hi JVM, can I have a safe XML Parser" it fails.

(Tika made improvements in the 1.10 to 1.13 period to how XML Parsing is done, including setting safer defaults, which is why this has started happening)

You either need to remove your old Xerces jars, so that the JVM-supplied XML Parser starts being used, or replace them with a more recent Xerces version

You may also find some of the advice in Error unmarshalling XML in Java 8 “secure-processing org.xml.sax.SAXNotRecognizedException” helpful, especially if you're struggling to locate the pesky old Xerces jar in your build!

Community
  • 1
  • 1
Gagravarr
  • 47,320
  • 10
  • 111
  • 156