0

this may be a basic question. I want to extract date with different separators / \ - from text in a column, and create a new column in the dataframe that contains only the date extracted.

example : Create a simple dataframe

 # importing pandas as pd 
 import pandas as pd 

 # creating a dataframe 
 df = pd.DataFrame({'A': ['Jo', 'Bo', 'Mi'], 
       'B': ['blabla (21-07-2009)blablabla', 'texttexttext 12/04/2010', 
       'textextblalba 28\03\2019)(12 texttext']}) 

result :

 df = pd.DataFrame({'A': ['Jo', 'Bo', 'Mi'], 
       'B': ['blabla (21-07-2009)blablabla', 'texttexttext 12/04/2010', 
       'textextblalba 28\03\2019)(12 texttext'], 
       'C': ['21-07-2009', '12/04/2010', '28\03\2019']})
Shahine Greene
  • 196
  • 1
  • 3
  • 15

1 Answers1

1

You can use str.extract

df["c"] = df["B"].str.extract(r'(\d+/\d+/\d+)')

Outputs:

In [4]: df["c"] = df["B"].str.extract(r'(\d+/\d+/\d+)')
In [5]: df
Out[5]:     A                                      B           c
        0  Jo           blabla (21/07/2009)blablabla  21/07/2009
        1  Bo                texttexttext 12/04/2010  12/04/2010
        2  Mi  textextblalba 28/03/2019)(12 texttext  28/03/2019
Alex
  • 6,610
  • 3
  • 20
  • 38
  • Hi @Alex, thank you for your code, it's only working when the separator is '/' for '\' and '-' it doesn't work. – Shahine Greene Aug 20 '19 at 12:59
  • @ShahineGreene my answer was only valid in the case of your first - unedited - question. Take a look at this [answer](https://stackoverflow.com/questions/46064162/extracting-dates-that-are-in-different-formats-using-regex-and-sorting-them-pa) which is far more comprehensive. Although I don't think that it accounts for the `\` separator. – Alex Aug 20 '19 at 13:03