How to add conditional columns to pandas df

Question

I want to create a column in a dataframe that is conditionally filled with values. Basically my dataframe loks like this

  Origin     X
0 Guatemala  x
1 China      x
2 Kenya      x
3 Venezuela  x
4 Bangladesh x

What I want to do now is create an additional column 'Continent', which adds the continent dependent on the country. My result would look like this:

 Origin      X  Continent
0 Guatemala  x  South america
1 China      x  Asia
2 Kenya      x  Africa
3 Venezuela  x  South america
4 Bangladesh x  Asia

I have tried the following codes to accieve what i want:

def GetContinents(x):
    if x['Origin']== 'Thailand' or 'Indonesia' or 'China' or 'Japan' or 'Bangladesh':
        return 'Asia'
    elif x['Origin']== 'Boliva' or 'Guatemala' or 'Venezuela' or 'Mexico' or 'Argentinia':
        return 'South America'
    elif x['Origin']== 'Guinea Bissau' or 'Egypt' or 'Zaire' or 'Kenya':
        return 'Africa'
    else:
        return 'unknown'

df['Continent']= df.apply(GetContinents, axis=1)

This one fills all the columns in 'continent' with 'Asia' mysteriously.

df['Continent'] = np.where(df['Origin'] == 'Bangladesh', 'Asia', 'unknown')

This one works fine in terms that it fills 'Asia' into the right column and unknown into all others, but when I try to make something like df['Continent'] = np.where(df['Origin'] == 'Bangladesh' or 'China', 'Asia', 'unknown') I get an error.

So basically my question is: how can I fullfill my if condition with different values?

score 1 · Accepted Answer · edited May 23 '17 at 11:45

You can construct the lists for each continent and apply a func:

In [35]:
asia = ['Thailand','Indonesia','China','Japan','Bangladesh']
south_america = ['Boliva' , 'Guatemala' , 'Venezuela' , 'Mexico' , 'Argentinia']
africa = [ 'Guinea Bissau' , 'Egypt' , 'Zaire' , 'Kenya']
def find_continent(x):
    if x in asia:
        return 'Asia'
    elif x in south_america:
        return 'South America'
    elif x in africa:
        return 'Africa'
    else:
        return 'Unknown'
df['Continent'] = df['Origin'].apply(find_continent)
df

Out[35]:
       Origin  X      Continent
0   Guatemala  x  South America
1       China  x           Asia
2       Kenya  x         Africa
3   Venezuela  x  South America
4  Bangladesh  x           Asia

Or if you have a much larger df then you can just make successive calls using isin and mask the rows using loc:

In [38]:
df.loc[df['Origin'].isin(asia),'Continent'] = 'Asia'
df.loc[df['Origin'].isin(south_america),'Continent'] = 'South America'
df.loc[df['Origin'].isin(africa),'Continent'] = 'Africa'
df['Continent'] = df['Continent'].fillna('Unknown')
df

Out[38]:
       Origin  X      Continent
0   Guatemala  x  South America
1       China  x           Asia
2       Kenya  x         Africa
3   Venezuela  x  South America
4  Bangladesh  x           Asia

As to why your attempts didn't work:

if x['Origin']== 'Thailand' or 'Indonesia' or 'China' or 'Japan' or 'Bangladesh'

This returns True because or 'Indonesia' is always True so all rows get set to Asia.

You should change it to like this:

if x['Origin'] in ('Thailand' , 'Indonesia' , 'China' , 'Japan' , 'Bangladesh'):

See related: How do I test one variable against multiple values?

Using np.where would be fine but you're not masking the rows so you continuously overwrite the rows so only the last op persists.

Great! Thanks also for the explanations why my code did not work! — sequence_hard, Oct 26 '15 at 12:13

How to add conditional columns to pandas df

1 Answers1