1

I'm working with the Named Entity Recognition annotator of CoreNLP.

My problem is that I would like to not recognize as entities relative dates. My goal is to connect dates with events

Some interesting dates are 18 Feb 1997, the 20th of july, the year 1992, 4 days from today and Monday the 13th.

In this example I would like to highlight "18 Feb 1997", "20th of july" and "1992". Even if some of these dates are not complete, they can still be used to search for events.

On the other hand "4 days from today" and "Monday the 13th" are not interesting for me: the reasons are that the first it is relative to the current date (or the date the text has been written), while the second one is too generic.

Is there a simple way to tell the NER annotator to discard relative dates?

Thank you

alsora
  • 551
  • 5
  • 17
  • This, exactly as stated, is difficult, as 20th of July is itself a relative date (what year?). In general, SUTime grounds with the `DocDateAnnotation`, which you can set with: `document.set(CoreAnnotations.DocDateAnnotation.class, Instant.now().toString());` and presumably clear by setting it to null, but this'll then miss 20th of July as well as 4 days from today. – Gabor Angeli Mar 08 '18 at 06:54

1 Answers1

1

I found the following solution, which works very well in my case.

Each token representing a Time/Date Named Entity has an annotation field containing its normalized form.

The absolute dates that I want to recognize will have a normalized form which follows the following pattern:

  • 18 Feb 1997 -> 1997/02/18
  • 20th of July -> XXXX/07/20
  • 1992 -> 1992

Using a REGEX it is possible to discard annotations which do not have a normalized form like this.

(\d{4}|X{4})((\/\d{2}(\/\d{2})?)?)
alsora
  • 551
  • 5
  • 17