I have a dataset with a date column but they sometimes appear as such:
- '20 March 2020 (UK)\n'
- 'Paid on $ September 2005'
- '4 September 2020 (Japan)'
How can I extract the dates from the column, please?
I have a dataset with a date column but they sometimes appear as such:
- '20 March 2020 (UK)\n'
- 'Paid on $ September 2005'
- '4 September 2020 (Japan)'
How can I extract the dates from the column, please?
from dateutil.parser import parse
def is_date(string, fuzzy=False):
"""
Return whether the string can be interpreted as a date.
:param string: str, string to check for date
:param fuzzy: bool, ignore unknown tokens in string if True
"""
try:
parse(string, fuzzy=fuzzy)
return True
except ValueError:
return False
somestrings=['20 March 2020 (UK)\n','Paid on $ September 2005','4 September 2020 (Japan)']
result=[]
for somestring in somestrings:
result.append(' '.join([substring for substring in somestring.split(' ') if is_date(substring)]))
print(result)
['20 March 2020', 'September 2005', '4 September 2020']