20

Recently I am being challenged by quite an "easy" problem. Suppose that there is sentences (saved in a String), and I need to find out if there is any date in this String. The challenges is that the date can be in a lot of different formats. Some examples are shown in the list:

  • June 12, 1956
  • London, 21st October 2014
  • 13 October 1999
  • 01/11/2003

Worth mentioning that these are contained in one string. So as an example it can be like:

String s = "This event took place on 13 October 1999.";

My question in this case would be how can I detect that there is a date in this string. My first approach was to search for the word "event", and then try to localize the date. But with more and more possible formats of the date this solution is not very beautiful. The second solution that I tried is to create a list for months and search. This had good results but still misses the cases when the date is expressed all in digits.

One solution which I have not tried till now is to design regular expressions and try to find a match in the string. Not sure how much this solution might decrease the performance.

What could be a good solution that I should probably consider? Did anybody face a similar problem before and what solutions did you find?

One thing is for sure that there are no time, so the only interesting part is the date.

Tunaki
  • 132,869
  • 46
  • 340
  • 423
bbakiu
  • 281
  • 3
  • 11
  • The only way i could think of getting close to this problem is to combine all your approches. E.g. look for the months in the string, after that try to find American dates (MM/DD/YYYY), after that try to find European dates (DD.MM.YYYY)..... It just depends on what kind of strings you receive! If some string contains "it happend on the first day in the 2nd month in 1989" you may not be able to achieve it!) – ParkerHalo Nov 05 '15 at 14:39
  • The only way is to parse your string. – Nvan Nov 05 '15 at 14:42
  • 3
    I remember of a guy with the same problem: http://stackoverflow.com/questions/33098511/how-to-retrieve-temporal-values-from-string-in-java/33099268#33099268 – Emanuele Ivaldi Nov 05 '15 at 14:42
  • 2
    It's not an easy problem. And if you need to know for sure what the dates are, it's unsolvable: your last example could be november 1 or january 11 – edc65 Nov 05 '15 at 19:03

4 Answers4

23

Using the natty.joestelmach.com library

Natty is a natural language date parser written in Java. Given a date expression, natty will apply standard language recognition and translation techniques to produce a list of corresponding dates with optional parse and syntax information.

import com.joestelmach.natty.*;

List<Date> dates =new Parser().parse("Start date 11/30/2013 , end date Friday, Sept. 7, 2013").get(0).getDates();
        System.out.println(dates.get(0));
        System.out.println(dates.get(1));

//output:
//Sat Nov 30 11:14:30 BDT 2013
//Sat Sep 07 11:14:30 BDT 2013
NightSkyCode
  • 1,141
  • 2
  • 16
  • 33
3

You are after Named Entity Recognition. I'd start with Stanford NLP. The 7 class model includes date, but the online demo struggles and misses the "13". :(

Natty mentioned above gives a better answer.

Michael Lloyd Lee mlk
  • 14,561
  • 3
  • 44
  • 81
1

If it's only one String you could use the Regular Expression as you mentioned. Having to find the different date format expressions. Here are some examples: Regular Expressions - dates

In case it's a document or a big text, you will need a parser. You could use a Lexical analysis approach.

Depending on the project using an external library as mentioned in some answers might be a good idea. Sometimes it's not an option.

Jelle
  • 576
  • 2
  • 10
0

I've done this before with good precision and recall. You'll need GATE and its ANNIE plugin.

  1. Use GATE UI tool to create a .GAPP file that will contain your processing resources.

  2. Use the .GAPP file to use the extracted Date annotation set.

Step 2 can be done as follows:

Corpus corpus = Factory.newCorpus("Gate Corpus");
Document gateDoc = Factory.newDocument("This event took place on 13 October 1999.");
corpus.add(gateDoc);
File pluginsHome = Gate.getPluginsHome();
File ANNIEPlugin = new File(pluginsHome, "ANNIE");
File AnnieGapp = new File(ANNIEPlugin, "Test.gapp");
AnnieController =(CorpusController) PersistenceManager.loadObjectFromFile(AnnieGapp);
AnnieController.setCorpus(corpus);
AnnieController.execute();

Later you can see the extracted annotations like this:

AnnotationSetImpl ann = (AnnotationSetImpl) gateDoc.getAnnotations();
System.out.println("Found annotations of the following types: "+ gateDoc.getAnnotations().getAllTypes());

I'm sure you can do it easily with the inbuilt annotation set Date. It is also very enhancable.

To enhance the annotation set Date create a lenient annotation rule in JAPE say 'DateEnhanced' from inbuilt ANNIE annotation Date to include certain kinds of dates like "9/11" and use a Chaining of Java regex on R.H.S. of the 'DateEnhanced' annotations JAPE RULE, to filter some unwanted outputs (if any).

Identity1
  • 1,139
  • 16
  • 33