0

I have a dataset with a date column but they sometimes appear as such:

  • '20 March 2020 (UK)\n'
  • 'Paid on $ September 2005'
  • '4 September 2020 (Japan)'

How can I extract the dates from the column, please?

Adam Strauss
  • 1,889
  • 2
  • 15
  • 45

1 Answers1

0

source

from dateutil.parser import parse

def is_date(string, fuzzy=False):
    """
    Return whether the string can be interpreted as a date.

    :param string: str, string to check for date
    :param fuzzy: bool, ignore unknown tokens in string if True
    """
    try: 
        parse(string, fuzzy=fuzzy)
        return True

    except ValueError:
        return False

somestrings=['20 March 2020 (UK)\n','Paid on $ September 2005','4 September 2020 (Japan)']
result=[]
for somestring in somestrings:
    result.append(' '.join([substring for substring in somestring.split(' ') if is_date(substring)]))

print(result)
['20 March 2020', 'September 2005', '4 September 2020']