-2

Can you please help me how to disentangle the following issue. I have a column in pandas df called "names" that contains links to webpages. I need to create a variable called "total categories" that will contain the parts of the link that appears after the last appearance of "/" sign. Example:

names
https://www1.abc.com/aaa/72566-finance
https://www1.abc.com/aaa1/725-z2
https://www1.abc.com/aaa2/75-z3

total categories
72566-finance
725-z2
75-z3

I tried this code:

def find_index(x):
    return x.rindex('/')

data_pd['total categories'] = data_pd['names'].apply(find_index)

I receive the following error:

AttributeError: 'float' object has no attribute 'rindex'
Alberto Alvarez
  • 805
  • 3
  • 11
  • 20
  • OK, so what have you tried so far? Does `split("/")[-1]` not work for you? – MattDMo Jun 17 '22 at 17:52
  • 3
    Does this answer your question? [How to get everything after last slash in a URL?](https://stackoverflow.com/questions/7253803/how-to-get-everything-after-last-slash-in-a-url) – quartzic Jun 17 '22 at 17:56

2 Answers2

1

If you have these set up as columns in a pandas DataFrame, you can do the following:

df['total categories'] = df['names'].str.split('/').str[-1]

This will split the string based on the passed delimiter, '/', and then take the last element of the resulting splits.

Philip Ciunkiewicz
  • 2,652
  • 3
  • 12
  • 24
1

Use str.extract with the r'/([^/]+)$' regex:

df['total categories'] = df['names'].str.extract(r'/([^/]+)$')

output:

                                    names total categories
0  https://www1.abc.com/aaa/72566-finance    72566-finance
1        https://www1.abc.com/aaa1/725-z2           725-z2
2         https://www1.abc.com/aaa2/75-z3            75-z3

regex demo and description:

/       # match a literal /
(       # start capturing
[^/]+   # one or more non-/ characters
)       # end capturing
$       # end of string
mozway
  • 194,879
  • 13
  • 39
  • 75