0

Below code prints only 2-Nov-2018; how do I modify the code so that both the date formats are picked.

import re
string = "some text contains 2-Nov-2018 and 3-11-2018"

date = re.findall('\d{1,2}[/-]\D{1,8}[/-]\d{2,4}', string)
print(date)
cs95
  • 379,657
  • 97
  • 704
  • 746
  • 1
    Why not use an external module to do this? [`datefinder`](https://datefinder.readthedocs.io/en/latest/) worked for me. – cs95 Dec 04 '18 at 05:50

1 Answers1

0

I think the simplest thing would be to write multiple patterns.

(Assuming you are just looking for these two patterns -- obviously gets more complicated to do yourself if you are looking for every possible date format)

import re

date_string = "some text contains 2-Nov-2018 and 3-11-2018"

formats = [r'\d{1,2}[/-]\D{1,8}[/-]\d{2,4}',   # List of patterns
           r'\d{1,2}[/-]\d{1,2}[/-]\d{2,4}']
dates = re.findall('|'.join(formats), date_string) # Join with | operator

dates

# ['2-Nov-2018', '3-11-2018']

To standardize the dates after this, you could try something like pandas.to_datetime :

import pandas as pd

dates = ['2-Nov-2018', '3-11-2018']

std_dates = [pd.to_datetime(d) for d in dates]

std_dates

# [Timestamp('2018-11-02 00:00:00'), Timestamp('2018-03-11 00:00:00')]

As was mentioned in some comments, there may be libraries already built to do all of this for you. So if you are looking for a more general approach, I would take a look at those libraries.

Stephen C
  • 1,966
  • 1
  • 16
  • 30