Create a new dataframe column by comparing two other columns in different dataframes

Question

I have a DataFrame which contains Alpha 2 country codes (UK, ES, SL etc) and I need these to be the country names. I created a second data frame that has all the Alpha 2 country codes in one column and the corresponding names in another.

I'm trying to compare these two columns then using the index to create the new column. However I am struggling to do this without using a loop. I feel like there is a more efficient way to do this without looping?

I have tried using a for loop, iterating over:

cube_data = pd.DataFrame({'Country Code':['UK','ES','SL']})
alpha2 = pd.DataFrame({'Code':['ES','GH','UK','SL'],
               'Name':['Spain','Ghana','United Kingdom','Sierra Leone']})
cube_data
  Country Code
0           UK
1           ES
2           SL

alpha2
  Code            Name
0   ES           Spain
1   GH           Ghana
2   UK  United Kingdom
3   SL    Sierra Leone

I have used a for loop to iterate through the columns and when the code from cube_data is found in alpha2['Code'] the index is used to create a new series which has alpha['Name'] at the correct position corresponding to the cube_data.

end result is:

cube_data
  Country Code    Name
0   UK  United Kingdom
1   ES           Spain
2   SL    Sierra Leone

Surely there is a better way to do this without looping? I have had a look at series.isin() and series.map() but these do not seem to provide the result I need.

Can this be done without a loop?

[`df.merge()`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html) — smci, Jan 15 '19 at 11:38

score 2 · Answer 1 · answered Jan 15 '19 at 11:14

You can use pandas merge:

df = alpha2.merge(cube_data, left_on='Code', right_on='Country Code', how='inner').drop('Code', axis=1)

merge works like an SQL join: here we merge alpha2 with cube_data. We use the columns 'Code' from alpha2 and 'Country Code' from cube_data to merge the two datframes together and use an 'inner' join logic meaning that only values present in both dataframes will be kept. Finally we drop the column 'Code' from alpha2 which contains the same values as the column 'Country Code'

user3471881 · Accepted Answer · 2019-01-15T11:08:49.003

Use map after converting alpha2 to a mappable object.

First we make our map:

>> country_map = alpha2.set_index('Code')['Name'].to_dict()
>> # country_map = dict(alpha2[['Code', 'Name']].values)
>> # country_map = alpha2.set_index('Code')['Name']
>> print(country_map)
{'ES': 'Spain', 'UK': 'United Kingdom', 'GH': 'Ghana', 'SL': 'Sierra Leone'}

Then we map it on the Country Code column:

>> cube_data['Country'] = cube_data['Country Code'].map(country_map)
>> print(cube_data)
Country Code         Country
0           UK  United Kingdom
1           ES           Spain
2           SL    Sierra Leone

score 0 · Answer 3 · answered Jan 15 '19 at 11:36

Have you looked into the pycountry module?

I've changed your 'UK' alpha_2 to 'GB'.

import pandas as pd
import pycountry

cube_data = pd.DataFrame({'Country Code':['GB','ES','SL']})

for alpha2_code in cube_data['Country Code']:
    c = pycountry.countries.get(alpha_2=alpha2_code)
    print(c.name)

output:

United Kingdom
Spain
Sierra Leone

Using a lambda to create new column

df = cube_data
df['Name'] = df['Country Code'].apply(lambda x: pycountry.countries.get(alpha_2=x).name)
print(df)

output:

  Country Code            name
0           GB  United Kingdom
1           ES           Spain
2           SL    Sierra Leone

Create a new dataframe column by comparing two other columns in different dataframes

3 Answers3