replacing string with a different string in pandas depending on value

Question

I am practicing pandas and I have an exercise with which I have a problem

I have an excel file that has a column where two types of urls are stored.

df = pd.DataFrame({'id': [], 
                   'url': ['www.something/12312', 'www.something/12343', 'www.somethingelse/42312', 'www.somethingelse/62343']})

   | id | url |
    | -------- | -------------- |
    |     | 'www.something/12312'  |
    |   | 'www.something/12343'    |
    |     | 'www.somethingelse/42312'    | 
    |    | 'www.somethingelse/62343'    |

I am supposed to transform this into ids, but only number should be part of the id, the new id column should look like this:

df = pd.DataFrame({'id': [id_12312 , id_12343, diffid_42312, diffid_62343], 'url': ['www.something/12312', 'www.something/12343', 'www.somethingelse/42312', 'www.somethingelse/62343']})

| id | url |
| -------- | -------------- |
| id_12312    | 'www.something/12312'  |
| id_12343    | 'www.something/12343'    |
| diffid_42312    | 'www.somethingelse/42312'    | 
| diffid_62343    | 'www.somethingelse/62343'    |

My problem is how to get only numbers and replace them if that kind of id? I have tried the replace and extract function in pandas

id_replaced = df.replace(regex={re.search('something', df['url']): 'id_' + str(re.search(r'\d+', i).group()), re.search('somethingelse', df['url']): 'diffid_' + str(re.search(r'\d+', i).group())})
        
df['id'] = df['url'].str.extract(re.search(r'\d+', df['url']).group())

However, they are throwing an error TypeError: expected string or bytes-like object.

Sorry for the tables in codeblock. The page was screaming that I have code that is not properly formatted when it was not in a codeblock.

Please format your examples so they are reproducible: https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples — nocibambi, May 27 '21 at 09:49
what exactly is `diffid`? When do you use `id` as prefix and when to use `diffid`? — Danail Petrov, May 27 '21 at 11:43

Danail Petrov · Accepted Answer · 2021-05-27T11:53:32.353

3

Here is one solution, but I don't quite understand when do you use the id prefix and when to use diffid ..

>>> df.id = 'id_'+df.url.str.split('/', n=1, expand=True)[1]
>>> df
         id                      url
0  id_12312      www.something/12312
1  id_12343      www.something/12343
2  id_42312  www.somethingelse/42312
3  id_62343  www.somethingelse/62343

Or using str.extract

>>> df.id = 'id_' + df.url.str.extract(r'/(\d+)$')

edited May 27 '21 at 11:53

answered May 27 '21 at 11:47

Danail Petrov

1,875
10
12

Thank you. The prefix is supposed to be different for a different web page, so when I have a webpage somethingelse the prefix is diffid_, but when I have webpage something the prefix is id_ – Paulina May 27 '21 at 12:30
Thank I managed to solve it for prefix too thanks to your help :) ```df['id_num'] = df.url.str.extract(r'/(\d+)$').astype(str) ``` ```df['id_prefix'] = np.where((df['url'].str.contains('somethingelse')), 'diffid_', 'id_') ``` ```df['id'] = df['id_prefix'] + df['id_num']``` – Paulina May 27 '21 at 12:38

replacing string with a different string in pandas depending on value

1 Answers1