2

I am trying to configure SUTime annotator (part of "ner") to use my own date/time rule files INSTEAD of the out-of-the-box rule files that are located in "models/sutime/" in the distribution JAR for Stanford CoreNLP models.

The reason for me doing that is that I want to slightly modify what SUTime rules are doing.

According to the official SUTime documentation, all it takes is specifying the "sutime.rules" property in the form of comma-separated file paths.
But after I did that, it appears that CoreNLP still takes the out-of-the-box rule files:

Reading TokensRegex rules from edu/stanford/nlp/models/sutime/defs.sutime.txt
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/english.sutime.txt
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/english.holidays.sutime.txt

I tried the absolute paths and the paths relative to my project root - still the same effect.
It appears that, contrary to the documentation, the "sutime.rules" property is simply getting ignored.

Any help will be greatly appreciated.

UPDATE:

The workaround in the form of:

  1. turning off SUTime as a part of the "ner" step
  2. copying its rule files and modifying them as necessary
  3. creating a custom annotator based on the TimeAnnotator class and adding it to the pipeline
  4. setting the .rules properties to the modified rule files

does not work.
The pipeline runs, but the functionality is not the same. The TimeAnnotator constructor needs to be invoked with the "sutime" parameterin order for its functionality to be exactly the same as if it was being called in the "ner" step.
This cannot be done via properties, it seems.

Gene M
  • 1,136
  • 1
  • 10
  • 16
  • Please tell me how you made the custom annotator class work, I made one but it still references in rules from the core nlp jar. – user2968505 Sep 12 '18 at 23:18

2 Answers2

2

Thank you for letting us know that this is not working. We will look into this and fix it for the next release. If you do need to change the rules files slightly, you can try to place your own copy of edu/stanford/nlp/models/sutime/english.sutime.txt in the classpath before the CoreNLP models jar.

Angel Chang
  • 364
  • 1
  • 1
  • 1
    Thank you, Angel. This workaround works. The only quirk that I am still experiencing is that I have to open the models JAR in my local Maven repository and delete the "english.sutime.txt" file from there. Only then my custom version of this file is taken from the project working directory. Hopefully this is fixed in CoreNLP 3.6.0 and I can get rid of this workaround by doing the proper configuration via the "sutime.rules" property. – Gene M Dec 22 '15 at 20:17
1

I too had a need to override the english.sutime.txt file. I accomplished this by creating an NERClassifierCombiner and using that when instantiating the NERCombinerAnnotator. Pseudo code:

Properties nerProps = new Properties();
nerProps.put("sutime.rules", "your new comma separated file list");
Set<String> passDownProps = Generics.newHashSet();
passdownProps.addAll(NERClassifierCombiner.DEFAULT_PASS_DOWN_PROPERTIES);
passdownProps.add("sutime.rules");
NERClassifierCombiner combiner = NERClassifierCombiner.createNERClassifierCombiner("giveItAName", passdownProps, nerProps);
NERCombinerAnnotator nerAnnotator = new NERCombinerAnnotator(combiner, false);

Hope that helps.

Jerry S
  • 11
  • 1
  • 2