I have the following 122 countries which I couldn't look up their corresponding alpha 3 code. I tried search_fuzzy but nothing is found.
By looking at some of the countries names, I can "manually" assign the alpha 3 based on common knowledge (such like creating a dic for rename). However, I wonder if there is a better way to look up the alpha 3 in an automated way, such like by using other function from pycoutry or even re?
Any suggestions and advice are greatly appreciated.
import pandas as pd
import numpy as np
import regex as re
import pycountry
missing = ['Americas', 'Asia', 'Australia and New Zealand', 'Bolivia (Plurinational State of)', 'Caribbean', 'Central America', 'Central and Southern Asia', 'Central Asia', 'China, Hong Kong Special Administrative Region', 'China, Macao Special Administrative Region', 'Democratic Republic of the Congo', 'Eastern Africa', 'Eastern and South-Eastern Asia', 'Eastern Asia', 'Eastern Europe', 'Europe', 'Europe and Northern America', 'Iran (Islamic Republic of)', 'Landlocked developing countries (LLDCs)', 'Latin America and the Caribbean', 'Least Developed Countries (LDCs)', 'Melanesia', 'Micronesia (Federated States of)', 'Middle Africa', 'Northern Africa', 'Northern Africa and Western Asia', 'Northern America', 'Northern Europe', 'Oceania', 'Oceania (exc. Australia and New Zealand)', 'Small island developing States (SIDS)', 'South America', 'South-Eastern Asia', 'Southern Africa', 'Southern Asia', 'Southern Europe', 'Sub-Saharan Africa', 'Türkiye', 'Venezuela (Bolivarian Republic of)', 'Western Africa', 'Western Asia', 'Western Europe', 'World', 'European Union (27)', 'Chinese Taipei', 'UAE', 'Belgium-Luxembourg', 'Channel Islands', 'China, Hong Kong SAR', 'China, Macao SAR', 'China, mainland', 'China, Taiwan Province of', 'Czechoslovakia', 'Ethiopia PDR', 'French Guyana', 'Netherlands Antilles (former)', 'Pacific Islands Trust Territory', 'Serbia and Montenegro', 'Sudan (former)', 'Svalbard and Jan Mayen Islands', 'United States Virgin Islands', 'USSR', 'Wallis and Futuna Islands', 'Yugoslav SFR', 'Global average', 'Cocos Islands', 'Macquarie Island', 'Northern Mariana Islands and Guam', 'Comoro Islands', 'Glorioso Islands', 'Juan de Nova Island', 'Bassas da India', 'Ile Europa', 'Ile Tromelin', 'Azores', 'Cape Verde', 'Canary Islands', 'Prince Edward Islands', 'Crozet Islands', 'Amsterdam Island and Saint Paul Island', 'Kerguelen Islands', 'Heard and McDonald Islands', 'Republique du Congo', 'Clipperton Island', 'Puerto Rico and Virgin Islands of the United States', 'Guadeloupe and Martinique', 'Faeroe Islands', 'Line Islands (Kiribati)', 'Phoenix Islands (Kiribati)', 'Howland Island and Baker Island', 'Guinea Bissau', 'Ivory Coast', 'Gilbert Islands (Kiribati)', 'Northern Saint-Martin', 'East Timor', 'Oecussi Ambeno', 'Laos', 'Republic of Congo', 'Dem. Rep. Congo', 'ASEAN', 'BRIICS', 'DRC', 'EA19', 'EECCA', 'EU27_2020', 'European Union', 'G20', 'G7M', 'Lao PDR', 'OECD', 'OECDAM', 'OECDAO', 'OECDE', 'Grenade', 'Korea, Rep.', 'Egypt, Arab Rep.', 'Iran, Islamic Rep.', 'Korea (Rep.)', 'Hong Kong, China', 'Iran (Islamic Republic)', 'Cote dIvoire', 'Congo (Democratic Republic)']
not_found = []
for country in missing:
try:
print(pycountry.countries.search_fuzzy(country))
print(country)
except:
print('not found')
not_found.append(country)
print(len(missing)) #122
print(len(not_found)) #122