2

Can somebody suggest any Library in Java which is capable of parsing Date/Time Calendar Event from Unstructured Data. Example

  • Starts 10pm Tonight! Sunday feb 10th => 10/Feb/2013 10pm
  • tomorrow (feb 10th) => 10/Feb/2013
  • Sunday Feb 10\r\nwith daily screenings till Feb 16th

and so on

The input data comes from user, so he may enter data in any random format. I started of identifying all the possible token and do a regex match to phrase all tokens. I wonder if someone can suggest some Library in Java, which might actually help in parsing.

I ran through other post on SO, but they seem to suggest techniques, i wonder if somebody has a library.

Thanks

jaipster
  • 11,967
  • 2
  • 21
  • 24

2 Answers2

0

You could take some of the trunk source from Apache openLNP (natural language processing) at http://opennlp.apache.org/ or just set up a callable RESTful web service by implementing openNLP on your server. Benefit of implementing the OOB openNLP is you have entity extractors through the nameFinder interface for dates, times, organizations, locations, and people. You would also be able to build an example file of more typical context for the items of interest indicating their appropriate entity type and train the NLP model against it to gain a better hit rate for your context. I have a working example of a C# NLP in the apps section of my portfolio at http://www.augmentedintel.com/apps/csharpnlp/extract-names-from-text.aspx.

  • Thanks Don for the response :). I tried other parsing libraries(except this one) but the experience was not very good. So i came up with a ranking algo based on the keywords in the text. And with some rounds of manual review the accuracy of the algo improved. However i will probe in the direction you suggested. Thanks – jaipster Apr 28 '13 at 04:56
0

UTAH (https://github.com/sonalake/utah-parser) is able to handle generic parsing of unstructured text into maps. Once you've done that you should be able to throw that into a formatter.