So I have this data set:
1.0 20/20/1999
2.0 31/2014
3.0 2015
4.0 2008-01-01 00:00:00
5.0 1903-10-31 00:00:00
6.0 1900-01-20 00:00:00
7.0 2011-02-21 00:00:00
8.0 1999-10-11 00:00:00
Those dates imported from excel but since the dataset is large and from multiple sources I can have any number of yyyy-mm-dd permutations with - or / or none as separators and missing months or days. It's a nightmare.
I want to keep those valid formats while those that are not recognized as valid should return a year or nothing.
This is where I got so far:
- I import as is from excel
df['date_col'].date_format('%Y-%m-%d')
I found regex to match only year field but I'm stuck on with what to use it on ^[0-9]{2,2}$
I have tried dateutil without success. It's refusing to parse examples with month only