extracting date using regex from the cell in pandas dataframe

Question

I have a following dataframe

column 1   Description                          Extracted Data
date       January 15,2020 is important day

I want to get following result

column 1   Description                          Extracted Data
date       January 15,2020 is important day     January 15,2020

df.loc[df['column 1']=='date','Extracted Data']=df['Description'].str.extract(r'((January)|[/. ])|(\d{1,2}|[/., ]|\d{4})')

but I ma not getting desired result.Instead, i ma getting dataframe with all NaN values. How can I fix this?

check out [this](https://stackoverflow.com/questions/47877816/regular-expression-of-different-format-of-dates-in-python) — Shubham Shaswat, Feb 11 '20 at 07:09

score 1 · Accepted Answer · answered Feb 11 '20 at 07:11

Use multi dot .* and digits.

import pandas as pd

df = pd.DataFrame({'column 1': ['date'], 'Description': ['January 15,2020 is important day']})
df['Extracted Data'] = df['Description'].str.extract(r'(.*,\d{4})')

Output:

  column 1                       Description   Extracted Data
0     date  January 15,2020 is important day  January 15,2020

score 1 · Answer 2 · answered Feb 11 '20 at 07:12

This works:(Oneliner)

df['Extracted data'] = [re.match('[A-Za-z]+ \d{2},\d{4}',x)[0] for x in df['Description']]

output:

  column1                              Desc   Extracted data
0    date  January 15,2020 is important day  January 15,2020

Regex Link: https://regex101.com/r/ICDJCp/1

score 0 · Answer 3 · answered Feb 11 '20 at 07:23

import dateutil.parser as dparser
import pandas as pd

df = pd.DataFrame({'column 1': ['date'], 'Description': ['January 15,2020 is important day']})
df['Extracted Data'] = df['Description'].apply(lambda x: dparser.parse(x,fuzzy=True).strftime('%B %d %Y'))
print(df)
  column 1                       Description   Extracted Data
0     date  January 15,2020 is important day  January 15 2020

extracting date using regex from the cell in pandas dataframe

3 Answers3