0

I'm working with a dataframe where I wish to change entries in country column, eg:

'Bolivia (Plurinational State of)' should be 'Bolivia',

'Switzerland17' should be 'Switzerland'

I have defined the following function:

def process(w):
    for i in range(len(w)):
        if w[i] in ['(', ')', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '&', '/']:
            w = w[0:i]
            w = ''.join(w).replace(" ", "")
            break

    return w

which I have then applied to the dataframe using the python apply function.

energy['Country'] = energy['Country'].apply(process)

While I have been able to achieve the desired output, it is not entirely correct. Some entries like

United Kingdom of Great Britain and Northern Ireland and United States of America20 have changed to UnitedKingdomofGreatBritainandNorthernIreland and UnitedStatesofAmerica .

What am I doing wrong? Also what would be a more effective, concise code to achieve the result?

TrigonaMinima
  • 1,828
  • 1
  • 23
  • 35
  • you want to remove integer part from country name or something else? – Rohit-Pandey Jan 16 '18 at 05:02
  • @Shubham Gupta, the honor code https://learner.coursera.help/hc/en-us/articles/209818863-Coursera-Honor-Code states `Your answers to homework, quizzes, and exams must be your own work` – Bharath M Shetty Jan 16 '18 at 05:08
  • These links https://stackoverflow.com/questions/41719259/pandas-dataframe-how-to-remove-numbers-from-string-terms-in-a-dataframe, https://stackoverflow.com/questions/20894525/how-to-remove-parentheses-and-all-data-within-using-pandas-python will help you part way. Can can work with them. – Bharath M Shetty Jan 16 '18 at 05:12
  • `a=''.join([i for i in a if i.isalpha()])` use this statement for doing this. – Rohit-Pandey Jan 16 '18 at 05:23
  • @Dark, of course I would submit my own work. I was just wondering if someone could point me towards a solution better than my obviously primitive one. Thank you for the links though! – Shubham Gupta Jan 16 '18 at 09:24
  • @Rohit-Pandey, I already did that. Doesn't get the desired result. Bolivia (Plurinational State of) changes to BoliviaPlurinationalStateof. Although I could work with this, it's not what I was looking for. – Shubham Gupta Jan 16 '18 at 09:27
  • Please specify some of the test case in your problem statement for more clarification? – Rohit-Pandey Jan 16 '18 at 09:29
  • @Dark, is there way I can pass a more general list of characters instead of doing them individually? – Shubham Gupta Jan 16 '18 at 09:34
  • @ShubhamGupta its called regex. Try to look at some regex examples – Bharath M Shetty Jan 16 '18 at 09:35

1 Answers1

0

I could be missing something, but it looks like

replace(" ", "")

is going to remove spaces, which is exactly what is happening with UnitedStatesofAmerica

chrisfs
  • 6,182
  • 6
  • 29
  • 35
  • ```pd.replace``` also has a ```regex=False``` (default) flag using which you can directly replace the values instead of the looping through the values. – TrigonaMinima Jan 16 '18 at 08:38
  • Thanks! That was exactly the case. Another question though, American Samoa was not returned as AmericanSamoa (and many similar to it) but United States of America was. Is there something I'm missing here? – Shubham Gupta Jan 16 '18 at 09:20
  • My guess is that if American Samoa had no numbers of characters in it then , it wouldn't have triggered the If statement, so it would have never gone into that loop. If you found my answer useful, please click on the upward triangle next to my answer. It will give me points. – chrisfs Jan 17 '18 at 09:49