1

According to CoreNLP's Git, the issue has been fixed in some version of CoreNLP, possibly 3.5.1 according to my guess since NER is listed as one of the changed modules in the change notes. However, 3.5.x requires the jump to Java 1.8 and we are not prepared to do so at the current time.

Also, disclaimer, I did post to that issue as well, but it may not been seen because the issue has been resolved. Given that SO is an official forum for support for CoreNLP, I ask here.

So I am asking, what is the change to fix this? Does it in fact exist in a current version, or is there something else that needs to be done. I need to fix this without upgrading from the 3.4.1 that I am currently using.

For the record, the string below is supposed to represent Dec 3, 2009 at 10:00 (no seconds are given in that string, so we assume 00 as well).

Here is the stack trace.

java.lang.NumberFormatException: For input string: "200912031000"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:583)
at java.lang.Integer.valueOf(Integer.java:766)
at edu.stanford.nlp.ie.pascal.ISODateInstance.extractDay(ISODateInstance.java:1107)
at edu.stanford.nlp.ie.pascal.ISODateInstance.extractFields(ISODateInstance.java:398)
at edu.stanford.nlp.ie.pascal.ISODateInstance.<init>(ISODateInstance.java:82)
at edu.stanford.nlp.ie.QuantifiableEntityNormalizer.normalizedDateString(QuantifiableEntityNormalizer.java:363)
at edu.stanford.nlp.ie.QuantifiableEntityNormalizer.normalizedDateString(QuantifiableEntityNormalizer.java:338)
at edu.stanford.nlp.ie.QuantifiableEntityNormalizer.processEntity(QuantifiableEntityNormalizer.java:1018)
at edu.stanford.nlp.ie.QuantifiableEntityNormalizer.addNormalizedQuantitiesToEntities(QuantifiableEntityNormalizer.java:1320)
at edu.stanford.nlp.ie.NERClassifierCombiner.classifyWithGlobalInformation(NERClassifierCombiner.java:145)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.classifySentenceWithGlobalInformation(AbstractSequenceClassifier.java:322)
at edu.stanford.nlp.pipeline.NERCombinerAnnotator.doOneSentence(NERCombinerAnnotator.java:148)
at edu.stanford.nlp.pipeline.SentenceAnnotator.annotate(SentenceAnnotator.java:95)
at edu.stanford.nlp.pipeline.NERCombinerAnnotator.annotate(NERCombinerAnnotator.java:137)
at edu.stanford.nlp.pipeline.AnnotationPipeline.annotate(AnnotationPipeline.java:67)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.annotate(StanfordCoreNLP.java:847)

EDIT

I am looking at this again because I am addressing some sutime portions of my code currently and I can reproduce by simply doing:

    ISODateInstance idi = new ISODateInstance();
    boolean fields = idi.extractFields("200912031000");
    System.out.println(fields);

Note that true is the printed value.

demongolem
  • 9,474
  • 36
  • 90
  • 105

2 Answers2

0

I don't see this problem with Stanford CoreNLP 3.4.1. I downloaded the 3.4.1 distribution and ran on a sentence with a really long number and don't get any kind of crash.

Can you provide me a sample sentence that causes this crash?

StanfordNLPHelp
  • 8,699
  • 1
  • 11
  • 9
  • I can't really provide the whole document, but I hope the line in question is good enough `DATETIME_END: 200912031000` This happens in testing (classification) with a model that has already been created. It is tagged as DATE entity and then the problem occurs when this is treated as a DATE entity (as you can see in the parsing of a day from this String) – demongolem Jan 26 '16 at 13:20
  • Also, I am instantiating `nerCombiner = new NERClassifierCombiner(applyNumericClassifiers, useSUTime, props, // this parameter should contain SUtime properties myClassifierArray);` and I find if the first argument is manually set to false, the error will not occur. So mostly we are just accessing NER from the pipeline, but under some circumstances we have to construct it ourselves, and this is one such case. – demongolem Jan 26 '16 at 13:31
0

Ok, so let me say why the problem existed. There were two problems with extractDay() in 3.4.1:

  1. Integer.valueOf is used in line 1107. This creates the error we see because the String, if it were to be construed as a number, certainly would be a Long. Long.valueOf is used in later versions.
  2. False should be returned from extractDay because it was unable to do anything with that string. However, the try block (line 1106) is inside the for loop (line 1097) meaning that after a failure, more tokens could be examined leading to the method eventually returning true. This will allow the annotation to be created even though technically no annotation should be created since parsing failed. The try was moved outside of the for block in later versions.

So the only answer is update to a later version (although I can't update to a later version still at this time).

demongolem
  • 9,474
  • 36
  • 90
  • 105