2

Does anyone know of a library - ideally Python, that can have a stab at pulling dates out of text?

"Shall we go to the library today" -> 21 Jan 10 "Starting on the 1st of January" -> 1 Jan 10 "Anytime between 3nd and 5th of Feb 2009" -> 3 Feb 09, 5 Feb 09

It's a tough problem and probably why I havn't found anything! Already using NLTK by the way if that helps.

PhoebeB
  • 8,434
  • 8
  • 57
  • 76
  • Not that I know of any, but does it need to be international or US only? US only could be done with a set of regexes but internationalreally increases the amount of regexes :( – extraneon Jan 21 '10 at 12:11
  • dupe: http://stackoverflow.com/questions/1258712/fuzzy-timestamp-parsing-with-python and http://stackoverflow.com/questions/1822787/parsing-dates-from-free-text-input-in-python among many others – SilentGhost Jan 21 '10 at 12:17
  • fuzzy is a good search term. Following suggested threads above found this, http://stackoverflow.com/questions/1258712/fuzzy-timestamp-parsing-with-python/1378134#1378134 which looks promising. And yes, international is required sigh. – PhoebeB Jan 21 '10 at 12:28
  • These answers indeed parse strings related to time, but don't help with extracting time phrases from free text. – Adam Matan Jan 21 '10 at 12:53

3 Answers3

4

Looks like this module is what you are looking for: parsedatetime

Nadia Alramli
  • 111,714
  • 37
  • 173
  • 152
  • You'll probably have to tokenize your lines before moving them to the parser. – Adam Matan Jan 21 '10 at 12:26
  • 1
    Should have added to question - have been trying with this but it is easily fooled and then you have the problem of working out if the result is valid! Thanks for suggestion though. – PhoebeB Jan 21 '10 at 12:27
2

The PyParsing site has a little bonus script for parsing time expressions. I would say that is worth a look for you!

Edit: I see you already ended up there as I was typing my suggestion. Good luck to you!

jathanism
  • 33,067
  • 9
  • 68
  • 86
1

Thanks for the contributions - in the end I followed up one of the comments, that led to pyparsing, which led to the beginnings of a solution. many thanks all.

Have posted the work in progress, two pyparsing snippets of code here http://pbjots.blogspot.com/2010/01/using-pyparsing-to-extract-dates-from.html in case they help anyone.

PhoebeB
  • 8,434
  • 8
  • 57
  • 76