-1

I have a pandas dataframe with a column 'Country' that has values like these: 'Switzerland17', 'Bolivia (Plurinational State of)'. I want to convert them to just 'Switzerland', 'Bolivia'. How can I do that?

PS: I am able to solve the question using for loops but that's taking a long time as we have a dataframe here. Is there any pandas dataframe function we can use to solve this question?

Harsha
  • 533
  • 3
  • 13
  • 1
    Without seeing the sample data, we do not know the case. – BENY May 06 '20 at 01:31
  • Duplicate question: https://stackoverflow.com/questions/40691451/how-to-remove-digits-from-the-end-of-a-string-in-python-3-x – Branson Fox May 06 '20 at 01:37
  • @BransonFox in that question we need to change each string manually and is only possible if the strings are finite and all the strings are known. Here I have a dataframe and I want to use pandas dataframe functions to change the values. – Harsha May 06 '20 at 01:50

4 Answers4

2

If numbers and parenthesis are the only ones that signify the start of what you want to discard, you can split the string based on '(' and just keep the first part and again split the string based on the numbers and keep the first part and discard the rest.

a = 'Bolivia (Plurinational State of)'
a.split("(")[0] 

will give you Bolivia.

b = 'Switzerland17'
re.compile('[0-9]').split(b)[0] 

will give you Switzerland and discard anything after the appearance of any number.

Shan R
  • 521
  • 4
  • 8
  • How can I do that for all values in the column of the dataframe? (without using for loop) – Harsha May 06 '20 at 02:01
  • 1
    You can put that code in two functions and apply the function for the data frame column such as df['col'].apply(fn_name) – Shan R May 06 '20 at 02:17
1
def mysplit(a):
    b = a.split("(")[0]
    return re.compile('[0-9]').split(b)[0].rstrip()
df['Country'].apply(mysplit)

This will work.

MaxCoder
  • 36
  • 1
  • 3
0

So you have data like:

string = 'Switzerland17'

We can replace the numeric ending using the re module sub function.

import re
no_digits = re.sub(r'\d+$', '', string)

We get:

>>> no_digits
'Switzerland'
Branson Fox
  • 339
  • 1
  • 8
0

Let's say we have an example dataframe df as

    Country
0   Switzerland24
1   USA53
2   Norway3

You can use filter() function for your purpose,

df['Country'] = df['Country'].apply(lambda s : ''.join(filter(lambda x: x.isalpha(), s)))
print(df)


    Country
0   Switzerland
1   USA
2   Norway

or,

def remove_digits(s):
    for x in range(10):
        s = s.replace(str(x), '')
    return s

df['Country'] = df['Country'].apply(remove_digits)
print(df)

       Country
0  Switzerland
1          USA
2       Norway
Akash Karnatak
  • 678
  • 2
  • 7
  • 16