0

With this dataset:

df = pd.read_csv('https://covid.ourworldindata.org/data/ecdc/total_cases.csv')

I can extract all countries like so:

 countries = list(df)

ending up with:

countries = ['Afghanistan', 'Albania', 'Algeria', 'Andorra', 'Angola', 'Anguilla', 'Antigua and Barbuda', 'Argentina', 'Armenia', 'Aruba', 'Australia', 'Austria', 'Azerbaijan', 'Bahamas', 'Bahrain', 'Bangladesh', 'Barbados', 'Belarus', 'Belgium', 'Belize', 'Benin', 'Bermuda', 'Bhutan', 'Bolivia', 'Bonaire Sint Eustatius and Saba', 'Bosnia and Herzegovina', 'Botswana', 'Brazil', 'British Virgin Islands', 'Brunei', 'Bulgaria', 'Burkina Faso', 'Burundi', 'Cambodia', 'Cameroon', 'Canada', 'Cape Verde', 'Cayman Islands', 'Central African Republic', 'Chad', 'Chile', 'China', 'Colombia', 'Congo', 'Costa Rica', "Cote d'Ivoire", 'Croatia', 'Cuba', 'Curacao', 'Cyprus', 'Czech Republic', 'Democratic Republic of Congo', 'Denmark', 'Djibouti', 'Dominica', 'Dominican Republic', 'Ecuador', 'Egypt', 'El Salvador', 'Equatorial Guinea', 'Eritrea', 'Estonia', 'Ethiopia', 'Faeroe Islands', 'Falkland Islands', 'Fiji', 'Finland', 'France', 'French Polynesia', 'Gabon', 'Gambia', 'Georgia', 'Germany', 'Ghana', 'Gibraltar', 'Greece', 'Greenland', 'Grenada', 'Guam', 'Guatemala', 'Guernsey', 'Guinea', 'Guinea-Bissau', 'Guyana', 'Haiti', 'Honduras', 'Hungary', 'Iceland', 'India', 'Indonesia', 'International', 'Iran', 'Iraq', 'Ireland', 'Isle of Man', 'Israel', 'Italy', 'Jamaica', 'Japan', 'Jersey', 'Jordan', 'Kazakhstan', 'Kenya', 'Kosovo', 'Kuwait', 'Kyrgyzstan', 'Laos', 'Latvia', 'Lebanon', 'Liberia', 'Libya', 'Liechtenstein', 'Lithuania', 'Luxembourg', 'Macedonia', 'Madagascar', 'Malawi', 'Malaysia', 'Maldives', 'Mali', 'Malta', 'Mauritania', 'Mauritius', 'Mexico', 'Moldova', 'Monaco', 'Mongolia', 'Montenegro', 'Montserrat', 'Morocco', 'Mozambique', 'Myanmar', 'Namibia', 'Nepal', 'Netherlands', 'New Caledonia', 'New Zealand', 'Nicaragua', 'Niger', 'Nigeria', 'Northern Mariana Islands', 'Norway', 'Oman', 'Pakistan', 'Palestine', 'Panama', 'Papua New Guinea', 'Paraguay', 'Peru', 'Philippines', 'Poland', 'Portugal', 'Puerto Rico', 'Qatar', 'Romania', 'Russia', 'Rwanda', 'Saint Kitts and Nevis', 'Saint Lucia', 'Saint Vincent and the Grenadines', 'San Marino', 'Saudi Arabia', 'Senegal', 'Serbia', 'Seychelles', 'Sierra Leone', 'Singapore', 'Sint Maarten (Dutch part)', 'Slovakia', 'Slovenia', 'Somalia', 'South Africa', 'South Korea', 'South Sudan', 'Spain', 'Sri Lanka', 'Sudan', 'Suriname', 'Swaziland', 'Sweden', 'Switzerland', 'Syria', 'Taiwan', 'Tanzania', 'Thailand', 'Timor', 'Togo', 'Trinidad and Tobago', 'Tunisia', 'Turkey', 'Turks and Caicos Islands', 'Uganda', 'Ukraine', 'United Arab Emirates', 'United Kingdom', 'United States', 'United States Virgin Islands', 'Uruguay', 'Uzbekistan', 'Vatican', 'Venezuela', 'Vietnam', 'Zambia', 'Zimbabwe']

and latest number of cases of each respective country with:

cases=[]
for item in df:
  if item in countries:
    # most recent is the last
    n = df[item].iloc[-1]
    cases.append(n)

ending up with:

 cases = [367.0, 383.0, 1468.0, 545.0, 17.0, 3.0, 15.0, 1715.0, 853.0, 74.0, 5956, 12640, 717.0, 36.0, 811.0, 164.0, 63.0, 861.0, 22194, 7.0, 26.0, 39.0, 5.0, 210.0, 2.0, 781.0, 7.0, 13717.0, 3.0, 135.0, 577.0, 384.0, 3.0, 117.0, 685.0, 17883, 7.0, 45.0, 9.0, 9.0, 5116.0, 82784, 1780.0, 45.0, 483.0, 349.0, 1282.0, 396.0, 13.0, 494.0, 5017, 180.0, 5071, 121.0, 15.0, 1956.0, 3995.0, 1322.0, 78.0, 16.0, 31.0, 1149.0, 52.0, 184.0, 5.0, 15.0, 2308.0, 78167, 47.0, 30.0, 4.0, 196.0, 103228, 287.0, 113.0, 1832.0, 11.0, 12.0, 121.0, 80.0, 166.0, 144.0, 33.0, 33.0, 25.0, 312.0, 895.0, 1586, 5194.0, 2738.0, nan, 62589, 1031.0, 5709.0, 150.0, 9248.0, 135586, 63.0, 3906, 170.0, 349.0, 704.0, 172.0, 184.0, 743.0, 270.0, 12.0, 548.0, 548.0, 14.0, 19.0, 78.0, 880.0, 2970.0, 599.0, 85.0, 8.0, 3963.0, 19.0, 56.0, 293.0, 6.0, 268.0, 2785.0, 1056.0, 79.0, 15.0, 241.0, 6.0, 1184.0, 10.0, 22.0, 16.0, 9.0, 19580, 18.0, 969.0, 6.0, 278.0, 254.0, 8.0, 5863, 419.0, 4072.0, 260.0, 2249.0, 2.0, 115.0, 2954.0, 3764.0, 4848.0, 12442.0, 573.0, 2057.0, 4417.0, 7497.0, 105.0, 11.0, 14.0, 8.0, 279.0, 2795.0, 237.0, 2447.0, 11.0, 6.0, 1481, 40.0, 581.0, 1055.0, 8.0, 1749.0, 10384, 1.0, 140510, 185.0, 14.0, 10.0, 10.0, 7693, 22164, 19.0, 376.0, 24.0, 2369.0, 1.0, 65.0, 107.0, 596.0, 34109.0, 8.0, 52.0, 1462.0, 2359.0, 55242, 398809, 45.0, 424.0, 504.0, 7.0, 166.0, 251.0, 39.0, 10.0]

Now I need to plot all of this on a map, and for that I need ISO3 (alpha_3) code for each country. Order of items is crucial here.

Now, I've found this package which provides this info for each country:

import pycountry

and if I print (list(pycountry.countries)), I get an iterable like this:

[Country(alpha_2='AW', alpha_3='ABW', name='Aruba', numeric='533'), Country(alpha_2='AF', alpha_3='AFG', name='Afghanistan', numeric='004', official_name='Islamic Republic of Afghanistan'), Country(alpha_2='AO', alpha_3='AGO', name='Angola', numeric='024', official_name='Republic of Angola'), Country(alpha_2='AI', alpha_3='AIA', name='Anguilla', numeric='660'), Country(alpha_2='AX', alpha_3='ALA', name='Åland Islands', numeric='248'), Country(alpha_2='AL', alpha_3='ALB', name='Albania', numeric='008', official_name='Republic of Albania'), ...]

QUESTION

How can I search this last iterable and end up with an ordered list of alpha_3 codes, one code for each country in my countries list above (and in the same order), like so:

alpha_codes = ['AFG', 'ALB', ...]

Ps: countries that are not on the list 'countries' should be discarted from the iterable and the length of the three final lists must be the same.

8-Bit Borges
  • 9,643
  • 29
  • 101
  • 198
  • `[i.alpha_3 for i in pycountry.countries]` you can just call the code with a list comp – anky Apr 09 '20 at 04:32

1 Answers1

1

You can create dictionary:

d = {i.name: i.alpha_3 for i in list(pycountry.countries)}

And then mapping with get for code, second value means if no match return original value:

countries = [d.get(c, c) for c in countries]

Unfortunately many countries not mach, so is possible use search_fuzzy:

def look(x):
    try:
        return pycountry.countries.search_fuzzy(x)[0].alpha_3
    except:
        return x

countries = [look(c) for c in countries]
print (countries)
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • tkank you a lot. the lookup is VERY slow, however. it takes many seconds. is there a faster solution? – 8-Bit Borges Apr 09 '20 at 15:33
  • @8-BitBorges - I think if use dictionary from [this](https://stackoverflow.com/a/41245680/2901002) and `d = {b: a for a, b in Country}` instead `d = {i.name: i.alpha_3 for i in list(pycountry.countries)}`, also huge advantage is possible add missing values manually – jezrael Apr 09 '20 at 15:45