0

I have this 'Club' column in my pandas df that contains the names of English Premier League clubs but the naming of the club is not suited for what I want to achieve. I tried writing a function with conditional statements to populate another column with the names of the club in the format I want. I have tried applying my function to my df but I get this error:

    ValueError: ('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().', 'occurred at index 0')

This is what df called test looks like: test df

This is my function called clubs_name:

#We want to rename the clubs to be exact names like in Squad column in the epl_table_df dataframe
    def clubs_name(Club):
if Club == 'Leicester City LEI':
    return 'Leicester City'
elif Club == 'Tottenham Hotspur TOT':
    return 'Tottenham'
elif Club == 'Liverpool LIV':
    return 'Liverpool'
elif Club == 'Southampton SOU':
    return 'Southampton'
elif Club == 'Chelsea CHE':
    return 'Chelsea'
elif Club == 'Aston Villa AVL':
    return 'Aston Villa'
elif Club == 'Everton EVE':
    return 'Everton'
elif Club == 'Crystal Palace CRY':
    return 'Crystal Palace'
elif Club == 'Wolverhampton Wanderers WOL':
    return 'Wolves'
elif Club == 'Manchester City MCI':
    return 'Manchester City'
elif Club == 'Arsenal ARS':
    return 'Arsenal'
elif Club == 'West Ham United WHU':
    return 'West Ham'
elif Club == 'Newcastle United NEW  ':
    return 'Newcastle Utd'
elif Club == 'Manchester United MUN':
    return 'Manchester Utd'
elif Club == 'Leeds United LEE':
    return 'Leeds United'
elif Club == 'Brighton and Hove Albion BHA':
    return 'Brighton'
elif Club == 'Fulham FUL':
    return 'Fulham'
elif Club == 'West Bromwich Albion WBA':
    return 'West Brom'
elif Club == 'Burnley BUR':
    return 'Burnley'
elif Club == 'Sheffield United SHU':
    return 'Sheffield Utd'
else:
    return Club' 

When I test my function, it seems to be working:

print(clubs_name('Fulham FUL'))

This was how I tried to apply the function to the test df:

test.apply (lambda Club: clubs_name(Club), axis=1)

I am new to python and data science/analysis. I will appreciate a solution, an explanation of the error and what I was doing wrong.

MansaSeyi
  • 11
  • 2
  • You didn't provide a reproducible sample of your data so it's hard to test against (see [How to make good pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) ), but what about `test["Club"].apply(clubs_name)` – G. Anderson Nov 18 '20 at 23:14
  • As @G.Anderson said, I believe `test["Club"].apply(clubs_name)` should work. – Cainã Max Couto-Silva Nov 18 '20 at 23:18
  • Thanks, I tried this but it didn't work. It just gave me the Club column with the data still in the old format. I have attached a link to the csv file. [link](https://1drv.ms/u/s!Ar9s4m_3s940jX679mso2hDQ1BM_?e=D9lDba) – MansaSeyi Nov 19 '20 at 14:42
  • Thank you. I figured what the problem was. The Club value in my conditional statement had less spaces than in the Club column in my df. – MansaSeyi Nov 19 '20 at 23:09

1 Answers1

1

I think this can be easier achieved through panda's replace().

Simply create a dictionary of your old values to new values:

eg:

dict_replace = {
    'Tottenham Hotspur TOT':'Tottenham',
    'Liverpool LIV':'Liverpool',
    'Southampton SOU':'Southampton',
    'Chelsea CHE':'Chelsea'
    } #etc

And then use the dictionary to update the column in your dataframe:

assuming the column name in df you want to change is club

df['club'].replace(dict_replace, inplace=True)

or if you want a separate column, instead of overwriting:

df['club_name_new'] = df['club'].replace(dict_replace)

FULL TEST EXAMPLE:

import pandas as pd
df = pd.DataFrame({'club': ['Tottenham Hotspur TOT', 
                            'Liverpool LIV', 
                            'Southampton SOU', 
                            'Chelsea CHE', 
                            'Some other club'], 
                   'column': ['b', 'a', 'c', 'd', 'e'],'column2': [1, 2, 3, 4, 5]})
print('INITIAL DATAFRAME:')
print(df)
print('*'*10)

dict_replace = {
    'Tottenham Hotspur TOT':'Tottenham',
    'Liverpool LIV':'Liverpool',
    'Southampton SOU':'Southampton',
    'Chelsea CHE':'Chelsea'
    }

df['club_name_new'] = df['club'].replace(dict_replace)
print('DATAFRAME WITH NEW COLUMN NAMES:')
print(df)

returns processed df as:

                    club column  column2    club_name_new
0  Tottenham Hotspur TOT      b        1        Tottenham
1          Liverpool LIV      a        2        Liverpool
2        Southampton SOU      c        3      Southampton
3            Chelsea CHE      d        4          Chelsea
4        Some other club      e        5  Some other club

-- Comment follow up:

possible way to apply changes with rules:

## replace 'United' with 'Utd':
df['club'].str.replace('United', 'Utd')

## remove last 4 characters:
df['club'].str[:-4]

and then make a dictionary for remaining exceptions that don't follow the patterns, and apply that...

i.e. for specific conversion from some unique value to another, you would have to make a dictionary (how else would the program know what to change to?). But if the changes can be reduced to some pattern, you can use .str.replace()

yulGM
  • 894
  • 1
  • 5
  • 14
  • I tried this but it just created the new 'club_name_new' column with the same rows like in the 'Club' column. The rows didn't contain the dictionary values as intended. – MansaSeyi Nov 19 '20 at 15:21
  • 1
    did the values in your dictionary match the values in column exactly? it only replaces where there is a match – yulGM Nov 19 '20 at 15:39
  • I added a full text example. double check that you referencing correct dictionary, and that names in the dictionary match names you are looking up (spaces, capitalization, etc) – yulGM Nov 19 '20 at 15:54
  • Thank you so much @yulGM . Finally figured out why this and my earlier function did not work. It was a problem with the spaces as you suspected. Is there an easier way to go through the columns I want to replace without having to manually type out all the rows or create a dictionary like I did? Using a loop maybe? – MansaSeyi Nov 19 '20 at 23:05
  • it's kind of a different question :) and hard to tell without knowing your data, columns etc. But basically you would use dictionary to do a specific replace from one value to another. But otherwise maybe there are rules/logic about what motivates you to change the values? for example: 'remove last 4 characters'. replace 'United' with 'Utd'. You could do something like that, and then apply a dictionary change to make changes to outstanding exceptions. I'll add example of these 2 to answer so you can start. read up pandas on pd.str.replace() etc. – yulGM Nov 20 '20 at 01:03