1

I have the Date column which contains the following sample values

  • Posted on June 25, 2010 at 1:01 PM
  • March 14, 2011
  • Friday, April 15, 2011 12:15 am
  • Thursday, March 31st, 2011, 1:11pm
  • Updated: 9:34 am, Fri Jun 3, 2011.

I want to extract the dates (in BOLD) in the given string.Can I get a regular expression which would detect this date specified in words.

Thanks!!

Man-with-a-e
  • 87
  • 3
  • 13
  • Is it multilingual or english only? – home Sep 11 '11 at 18:27
  • 3
    possible duplicate of [Natural Language date and time parser for java](http://stackoverflow.com/questions/1410408/natural-language-date-and-time-parser-for-java) – Bart Kiers Sep 11 '11 at 18:39
  • There is nothing concrete on the thread. I just want to know if a regular expression or pattern matching can do this or not.. – Man-with-a-e Sep 11 '11 at 18:45
  • I have tried to identify patterns. For example used the substring function in java to get value between after "on" and before "at" for samples having value "Posted on June 25, 2010 at 1:01 PM". But other values I am finding difficulty. – Man-with-a-e Sep 11 '11 at 18:47

3 Answers3

2

I guess it depends how strict you need the expression to be. This one will work for all your examples:

/(January|February|March|April|May|June?|July|August|September|October|November|December)\s(\d\d?).+?(\d\d\d\d)/

But there is no enforcement of the st, nd, rd, th rules.

Nor is there an enforcement on the comma separating the day from the year.

And there is special case for a shortened June (for your example 5 there is an optional e for June), but no account taken for other shortened month names.

Sample output from Firebug:

>>> /(January|February|March|April|May|June?|July|August|September|October|November|December)\s(\d\d?).+?(\d\d\d\d)/.exec(s1)
["June 25, 2010", "June", "25", "2010"]
>>> /(January|February|March|April|May|June?|July|August|September|October|November|December)\s(\d\d?).+?(\d\d\d\d)/.exec(s2)
["March 14, 2011", "March", "14", "2011"]
>>> /(January|February|March|April|May|June?|July|August|September|October|November|December)\s(\d\d?).+?(\d\d\d\d)/.exec(s3)
["April 15, 2011", "April", "15", "2011"]
>>> /(January|February|March|April|May|June?|July|August|September|October|November|December)\s(\d\d?).+?(\d\d\d\d)/.exec(s4)
["March 31st, 2011", "March", "31", "2011"]
>>> /(January|February|March|April|May|June?|July|August|September|October|November|December)\s(\d\d?).+?(\d\d\d\d)/.exec(s5)
["Jun 3, 2011", "Jun", "3", "2011"]
Paul Grime
  • 14,970
  • 4
  • 36
  • 58
2

Do not reinvent, there exists plenty of programs that do what you want since its a fairly common problem. Try reading this http://javatechniques.com/blog/dateformat-and-simpledateformat-examples/ or just surf stackoverflow a little and you'll find plenty!

UlfR
  • 4,175
  • 29
  • 45
1
/\w+\s\d+(st)?(nd)?(rd)?(th)?,\s+\d+/

More comprehensive regex to accept forms that may not match the exact typed month but match the form of "Month Day(optional suffix), Year

Note that you could have something that looked like:

Blah 45rd, 2022222

And it would still catch it.

buddyp450
  • 528
  • 2
  • 10
  • 27