0

I have a following dataframe

column 1   Description                          Extracted Data
date       January 15,2020 is important day

I want to get following result

column 1   Description                          Extracted Data
date       January 15,2020 is important day     January 15,2020

df.loc[df['column 1']=='date','Extracted Data']=df['Description'].str.extract(r'((January)|[/. ])|(\d{1,2}|[/., ]|\d{4})')

but I ma not getting desired result.Instead, i ma getting dataframe with all NaN values. How can I fix this?

jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
TLanni
  • 330
  • 1
  • 4
  • 15

3 Answers3

1

Use multi dot .* and digits.

import pandas as pd

df = pd.DataFrame({'column 1': ['date'], 'Description': ['January 15,2020 is important day']})
df['Extracted Data'] = df['Description'].str.extract(r'(.*,\d{4})')

Output:

  column 1                       Description   Extracted Data
0     date  January 15,2020 is important day  January 15,2020
Zaraki Kenpachi
  • 5,510
  • 2
  • 15
  • 38
1

This works:(Oneliner)

df['Extracted data'] = [re.match('[A-Za-z]+ \d{2},\d{4}',x)[0] for x in df['Description']]

output:

  column1                              Desc   Extracted data
0    date  January 15,2020 is important day  January 15,2020

Regex Link: https://regex101.com/r/ICDJCp/1

Strange
  • 1,460
  • 1
  • 7
  • 18
0
import dateutil.parser as dparser
import pandas as pd

df = pd.DataFrame({'column 1': ['date'], 'Description': ['January 15,2020 is important day']})
df['Extracted Data'] = df['Description'].apply(lambda x: dparser.parse(x,fuzzy=True).strftime('%B %d %Y'))
print(df)
  column 1                       Description   Extracted Data
0     date  January 15,2020 is important day  January 15 2020
Rahul Verma
  • 2,988
  • 2
  • 11
  • 26