So I know that you can't use if
statements on a pandas dataframe according to this post or you will get this error: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
So how do you apply a function with multiple conditions?
I have a dataframe that was exported CRM data and contains a countries column that I need to convert to 2 letter country codes (United States to US and so on).
Here is a list of the unique values contained in the column of countries:
['United States', 'Canada', 'Australia', 'United Kingdom', 'US',
'Germany', 'New Zealand', 'Netherlands', 'Mexico', 'France',
'Ireland', 'Dominican Republic', 'Puerto Rico', 'Taiwan', 'USA',
'1', 'united States', 'United Staes', 'United State', 'usa',
'United Sates', 'United Stated', 'usaa', 'Unite States', 'nv',
'canada', 'Pakistan']
My solution was to try something like this:
def country_codes(country):
if country.str.contains(r'(United Kingdom)'):
return 'GB'
elif country.str.contains(r'(Canada|canada)'):
return 'CA'
elif country.str.contains(r'(Australia)'):
return 'AU'
elif country.str.contains(r'(United|US|USA|State|usa)'):
return 'US'
elif country.str.contains(r'(Germany)'):
return 'DE'
elif country.str.contains(r'(New Zealand)'):
return 'NZ'
elif country.str.contains(r'(Netherlands)'):
return 'NL'
elif country.str.contains(r'(Mexico)'):
return 'MX'
elif country.str.contains(r'(France)'):
return 'FR'
elif country.str.contains(r'(Ireland)'):
return 'IE'
elif country.str.contains(r'(Dominican)'):
return 'DO'
elif country.str.contains(r'(Puerto)'):
return 'PR'
elif country.str.contains(r'(Taiwan)'):
return 'TW'
else:
return country
but upon trying df.apply(country_codes)
I am getting the same ValueError
. If there's an easier way to do this without regex matching, I'm open to that as well.