40

I am trying to add a custom mime type to Apache Tika.

I have the following custom-mimetypes.xml document in org.apache.tika.mime :

<?xml version="1.0" encoding="UTF-8"?>
<mime-info>
    <mime-type type="text/stringtemplategroup">
        <glob pattern="*.stg"/>
    </mime-type>
    <mime-type type="text/stringtemplate">
        <glob pattern="*.st"/>
    </mime-type>
</mime-info>

I am getting an error about a Conflicting extension pattern .st:

Caused by: org.apache.tika.mime.MimeTypeException: Conflicting extension pattern: .st
    at org.apache.tika.mime.MimeTypesReader.startElement(MimeTypesReader.java:166)
    at org.apache.xerces.parsers.AbstractSAXParser.startElement(Unknown Source)

How do I override the default entry for *.st extension and have it use my own?

  • Did you fix it? I am facing the same problem. Let me know if you figured it out – kittu Jun 17 '15 at 14:31
  • 3
    I gave up TIka was a PITA because of some pretty bad design decisions about tightly coupling everything to a `File` object instead of an `InputStream` so using on Google App Engine was extremely hard and I had to fork and modify too much stuff to make it less painless. I ended up writing my own magic number classifier for the handful of types I support in my application. Tika is a good idea, terrible implementation. –  Jun 17 '15 at 15:17
  • Ok no luck for me then – kittu Jun 18 '15 at 05:01
  • 8
    real shame to read this as the guys in my development team forked tika and wrote lots of it to work from more of a stream model than remain coupled to File. Sadly though they weren't permitted to push back to the project due to fear from the company they work for and that was 3 or more years ago now! – default_avatar Aug 03 '17 at 15:38

1 Answers1

4

Seems you need to add a magic tag with a priority

<mime-type type="text/stringtemplate">
    <magic priority="50">
        <!-- some match pattern -->
        <!-- <match value="[some characters]" type="string" offset="0" /> -->
    </magic>
    <glob pattern="*.st"/>
</mime-type>
LMC
  • 10,453
  • 2
  • 27
  • 52
  • thanks for the info, but in the end it does not fix the tight coupling to `File` even if this works. –  Jun 14 '18 at 02:32
  • 1
    Thanks for the bounty, much appreciated. – LMC Jun 15 '18 at 23:48
  • 1
    thanks for taking the time and trying to help [those that might find this](https://xkcd.com/979/). –  Jun 20 '18 at 01:39
  • 1
    :) funny enough I got to this question by mistake when SO changed its UI. I didn't realize it was an old one until I have posted my answer. It caught my attention since I added custom magic numbers for Quark Xpress files to a Linux box acting as an Apple file server :p several years ago. – LMC Jun 20 '18 at 02:10