I need to find a Python regular expression in order to match every valid date in a raw text file. I split the text in lines and put them in a Pandas Series, the goal now, is to extract only the date in every line getting a series of dates. I was able to match most of the numerical date formats, but I stopped when I had to deal with literal months (Jan, January, Feb, February,...). In particular, I need a regex (or a set of them) which match the following formats:
- Mar-20-2009; Mar 20, 2009; March 20, 2009; Mar. 20, 2009; Mar 20 2009;
- 20 Mar 2009; 20 March 2009; 20 Mar. 2009; 20 March, 2009
- Mar 20th, 2009; Mar 21st, 2009; Mar 22nd, 2009
- Feb 2009; Sep 2009; Oct 2010
Any help will be appreciated, thank you in advance!