Below code prints only 2-Nov-2018; how do I modify the code so that both the date formats are picked.
import re
string = "some text contains 2-Nov-2018 and 3-11-2018"
date = re.findall('\d{1,2}[/-]\D{1,8}[/-]\d{2,4}', string)
print(date)
Below code prints only 2-Nov-2018; how do I modify the code so that both the date formats are picked.
import re
string = "some text contains 2-Nov-2018 and 3-11-2018"
date = re.findall('\d{1,2}[/-]\D{1,8}[/-]\d{2,4}', string)
print(date)
I think the simplest thing would be to write multiple patterns.
(Assuming you are just looking for these two patterns -- obviously gets more complicated to do yourself if you are looking for every possible date format)
import re
date_string = "some text contains 2-Nov-2018 and 3-11-2018"
formats = [r'\d{1,2}[/-]\D{1,8}[/-]\d{2,4}', # List of patterns
r'\d{1,2}[/-]\d{1,2}[/-]\d{2,4}']
dates = re.findall('|'.join(formats), date_string) # Join with | operator
dates
# ['2-Nov-2018', '3-11-2018']
To standardize the dates after this, you could try something like pandas.to_datetime
:
import pandas as pd
dates = ['2-Nov-2018', '3-11-2018']
std_dates = [pd.to_datetime(d) for d in dates]
std_dates
# [Timestamp('2018-11-02 00:00:00'), Timestamp('2018-03-11 00:00:00')]
As was mentioned in some comments, there may be libraries already built to do all of this for you. So if you are looking for a more general approach, I would take a look at those libraries.