I am practicing pandas and I have an exercise with which I have a problem
I have an excel file that has a column where two types of urls are stored.
df = pd.DataFrame({'id': [],
'url': ['www.something/12312', 'www.something/12343', 'www.somethingelse/42312', 'www.somethingelse/62343']})
| id | url |
| -------- | -------------- |
| | 'www.something/12312' |
| | 'www.something/12343' |
| | 'www.somethingelse/42312' |
| | 'www.somethingelse/62343' |
I am supposed to transform this into ids, but only number should be part of the id, the new id column should look like this:
df = pd.DataFrame({'id': [id_12312 , id_12343, diffid_42312, diffid_62343], 'url': ['www.something/12312', 'www.something/12343', 'www.somethingelse/42312', 'www.somethingelse/62343']})
| id | url |
| -------- | -------------- |
| id_12312 | 'www.something/12312' |
| id_12343 | 'www.something/12343' |
| diffid_42312 | 'www.somethingelse/42312' |
| diffid_62343 | 'www.somethingelse/62343' |
My problem is how to get only numbers and replace them if that kind of id? I have tried the replace and extract function in pandas
id_replaced = df.replace(regex={re.search('something', df['url']): 'id_' + str(re.search(r'\d+', i).group()), re.search('somethingelse', df['url']): 'diffid_' + str(re.search(r'\d+', i).group())})
df['id'] = df['url'].str.extract(re.search(r'\d+', df['url']).group())
However, they are throwing an error TypeError: expected string or bytes-like object.
Sorry for the tables in codeblock. The page was screaming that I have code that is not properly formatted when it was not in a codeblock.