extract product code using regular expression in Python and apply to a column

Question

I have a pd.DataFrame with multiple columns and one column has url extracted from web e.g.:

url = "http://www.currys.co.uk/gbuk/s/10153572/product_confirmation.html"

I have used regular expressions to extract the product code as below

re.findall('\d+', url)

However, if I try and replicate to the entire dataset ( which has multiple columns) I get an error

regex = lambda x: x.re.findall('\d+')
df["new_column"] = df['url'].apply(regex)

'str' object has no attribute 're' .

In pandas, use `df['url'].str.extractall(r'\d+')` instead. https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.str.extractall.html — Frank, Nov 19 '18 at 20:18
Use pandas str methods, df['url'].str.extract('(\d+)', expand = False) — Vaishali, Nov 19 '18 at 20:22

score 0 · Answer 1 · answered Nov 19 '18 at 20:22

0

Just use the same syntax in your lambda function that you used in your scaler example:

regex = lambda x: re.findall('\d+', x)

you probably want the zeroeth element too so you don't any up with a series of lists

regex = lambda x: re.findall('\d+', x)[0]

answered Nov 19 '18 at 20:22

robertwest

df['url'].str.extract('(\d+)', expand = False) this one does the trick – EricA Nov 19 '18 at 21:59

1 Answers1