1

i have a data frame as below :

df = pd.DataFrame({"country_code": ['AF', 'BEL', 'AUS', 'DE', 'IND', 'US', 'GBR'],
              "amount": [100, 200, 140, 400, 225, 125, 600]})

Column country code is a mix of both 2 letter and 3 letter country abbreviation.

Can any one kindly help me how to get the Full Country name in a new column in the same df ?

Sheriff
  • 145
  • 1
  • 1
  • 9
  • Do you have a dataset mapping country abbreviations to their full name? – yorodm Dec 25 '18 at 15:02
  • Can't you just add another column? ` –  Dec 25 '18 at 15:06
  • yes, i have a mapping dataset with 3 columns, "Country_name", "2 letter", "3 letter". How do i merge by checking if my country either exists in "2 letter" or "3 letter" and accordingly return the "Country_name". There should be some library in python, which may help instead of looking at mapping dataset ? – Sheriff Dec 25 '18 at 15:07
  • @nixon, my issue is that my data is a mix of both alpha_2 and alpha_3 codes. How do i handle it ? – Sheriff Dec 25 '18 at 15:11
  • You can use `pycountry` which has both `.alpha_3` and `alpha_2` methods – yatu Dec 25 '18 at 15:13

2 Answers2

7

First of all you should install package pycountry by typing pip install pycountry in command prompt and press enter.

import pycountry
import pycountry
df = pd.DataFrame({"country_code": ['AF', 'BEL', 'AUS', 'DE', 'IND', 'US', 'GBR','XYZ'],
              "amount": [100, 200, 140, 400, 225, 125, 600,0]})

list_alpha_2 = [i.alpha_2 for i in list(pycountry.countries)]
list_alpha_3 = [i.alpha_3 for i in list(pycountry.countries)]    

def country_flag(df):
    if (len(df['country_code'])==2 and df['country_code'] in list_alpha_2):
        return pycountry.countries.get(alpha_2=df['country_code']).name
    elif (len(df['country_code'])==3 and df['country_code'] in list_alpha_3):
        return pycountry.countries.get(alpha_3=df['country_code']).name
    else:
        return 'Invalid Code'

df['country_name']=df.apply(country_flag, axis = 1)
df
   amount country_code    country_name
0     100           AF     Afghanistan
1     200          BEL         Belgium
2     140          AUS       Australia
3     400           DE         Germany
4     225          IND           India
5     125           US   United States
6     600          GBR  United Kingdom
7       0          XYZ    Invalid Code
cph_sto
  • 7,189
  • 12
  • 42
  • 78
4

Considering that you have the dataset, or you could get through pycountry, you can work it either with following methods.

import pycountry
new_df = df['country-code'].apply(lambda x: pycountry.countries.get(alpha_3=x).name if len(x) == 3 else pycountry.countries.get(alpha_2=x).name)
print new_df

This prints:

new_df
0       Afghanistan
1           Belgium
2         Australia
3           Germany
4             India
5     United States
6    United Kingdom
Name: country_code, dtype: object

Now, considering that you have csv for both codes of length 2 and length 3 like this:

 df2
  code           name
0   AF    Afghanistan
1   DE        Germany
2   US  United States

and

df3
  code            name
0  BEL         Belgium
1  AUS       Australia
2  IND           India
3  GBR  United Kingdom

After this you follow these steps:

>>> new_df2 = df.merge(df2, left_on='country_code', right_on='code')
>>> new_df2
   amount country_code code           name
0     100           AF   AF    Afghanistan
1     400           DE   DE        Germany
2     125           US   US  United States
>>> new_df3 = df.merge(df3, left_on='country_code', right_on='code')
>>> new_df3
   amount country_code code            name
0     200          BEL  BEL         Belgium
1     140          AUS  AUS       Australia
2     225          IND  IND           India
3     600          GBR  GBR  United Kingdom
>>> df23 = pd.concat([new_df2, new_df3])
>>> df23.reset_index(inplace=True)
>>> df23.drop('index', inplace=True, axis=1)
>>> df23
   amount country_code code            name
0     100           AF   AF     Afghanistan
1     400           DE   DE         Germany
2     125           US   US   United States
3     200          BEL  BEL         Belgium
4     140          AUS  AUS       Australia
5     225          IND  IND           India
6     600          GBR  GBR  United Kingdom
thelogicalkoan
  • 620
  • 1
  • 5
  • 13