0

I have a pd.DataFrame with multiple columns and one column has url extracted from web e.g.:

url = "http://www.currys.co.uk/gbuk/s/10153572/product_confirmation.html"

I have used regular expressions to extract the product code as below

re.findall('\d+', url)

However, if I try and replicate to the entire dataset ( which has multiple columns) I get an error

regex = lambda x: x.re.findall('\d+')
df["new_column"] = df['url'].apply(regex)

'str' object has no attribute 're' .

r.ook
  • 13,466
  • 2
  • 22
  • 39
EricA
  • 403
  • 2
  • 14
  • 1
    In pandas, use `df['url'].str.extractall(r'\d+')` instead. https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.str.extractall.html – Frank Nov 19 '18 at 20:18
  • 1
    Use pandas str methods, df['url'].str.extract('(\d+)', expand = False) – Vaishali Nov 19 '18 at 20:22

1 Answers1

0

Just use the same syntax in your lambda function that you used in your scaler example:

regex = lambda x: re.findall('\d+', x)

you probably want the zeroeth element too so you don't any up with a series of lists

regex = lambda x: re.findall('\d+', x)[0]
robertwest
  • 904
  • 7
  • 13