How to remove numbers and parenthesis at the end of column values like in 'abc23', 'abc(xyz)' in Pandas Dataframe?

Question

I have a pandas dataframe with a column 'Country' that has values like these: 'Switzerland17', 'Bolivia (Plurinational State of)'. I want to convert them to just 'Switzerland', 'Bolivia'. How can I do that?

PS: I am able to solve the question using for loops but that's taking a long time as we have a dataframe here. Is there any pandas dataframe function we can use to solve this question?

Duplicate question: https://stackoverflow.com/questions/40691451/how-to-remove-digits-from-the-end-of-a-string-in-python-3-x — Branson Fox, May 06 '20 at 01:37
@BransonFox in that question we need to change each string manually and is only possible if the strings are finite and all the strings are known. Here I have a dataframe and I want to use pandas dataframe functions to change the values. — Harsha, May 06 '20 at 01:50

Shan R · Answer 1 · 2020-05-06T02:13:22.063

2

If numbers and parenthesis are the only ones that signify the start of what you want to discard, you can split the string based on '(' and just keep the first part and again split the string based on the numbers and keep the first part and discard the rest.

a = 'Bolivia (Plurinational State of)'
a.split("(")[0]

will give you Bolivia.

b = 'Switzerland17'
re.compile('[0-9]').split(b)[0]

will give you Switzerland and discard anything after the appearance of any number.

edited May 06 '20 at 02:13

answered May 06 '20 at 01:56

Shan R

521
4
8

How can I do that for all values in the column of the dataframe? (without using for loop) – Harsha May 06 '20 at 02:01
1

You can put that code in two functions and apply the function for the data frame column such as df['col'].apply(fn_name) – Shan R May 06 '20 at 02:17

score 1 · Accepted Answer · answered May 06 '20 at 02:27

1

def mysplit(a):
    b = a.split("(")[0]
    return re.compile('[0-9]').split(b)[0].rstrip()
df['Country'].apply(mysplit)

This will work.

answered May 06 '20 at 02:27

MaxCoder

36
1
3

score 0 · Answer 3 · answered May 06 '20 at 01:35

0

So you have data like:

string = 'Switzerland17'

We can replace the numeric ending using the re module sub function.

import re
no_digits = re.sub(r'\d+$', '', string)

We get:

>>> no_digits
'Switzerland'

answered May 06 '20 at 01:35

Branson Fox

339
1
8

Yah, but I can do that only if I know the strings. I don't how many such strings are there in the column. I just gave those two as examples. – Harsha May 06 '20 at 01:38
@Harsha The column in your dataframe contains strings. You need to apply the function to the entire column. – Branson Fox May 06 '20 at 01:40
do you mean I need to use a for loop? – Harsha May 06 '20 at 01:57

Akash Karnatak · Answer 4 · 2020-05-06T02:15:02.550

Let's say we have an example dataframe df as

    Country
0   Switzerland24
1   USA53
2   Norway3

You can use filter() function for your purpose,

df['Country'] = df['Country'].apply(lambda s : ''.join(filter(lambda x: x.isalpha(), s)))
print(df)


    Country
0   Switzerland
1   USA
2   Norway

or,

def remove_digits(s):
    for x in range(10):
        s = s.replace(str(x), '')
    return s

df['Country'] = df['Country'].apply(remove_digits)
print(df)

       Country
0  Switzerland
1          USA
2       Norway

How to remove numbers and parenthesis at the end of column values like in 'abc23', 'abc(xyz)' in Pandas Dataframe?

4 Answers4