With this dataset:
df = pd.read_csv('https://covid.ourworldindata.org/data/ecdc/total_cases.csv')
I can extract all countries like so:
countries = list(df)
ending up with:
countries = ['Afghanistan', 'Albania', 'Algeria', 'Andorra', 'Angola', 'Anguilla', 'Antigua and Barbuda', 'Argentina', 'Armenia', 'Aruba', 'Australia', 'Austria', 'Azerbaijan', 'Bahamas', 'Bahrain', 'Bangladesh', 'Barbados', 'Belarus', 'Belgium', 'Belize', 'Benin', 'Bermuda', 'Bhutan', 'Bolivia', 'Bonaire Sint Eustatius and Saba', 'Bosnia and Herzegovina', 'Botswana', 'Brazil', 'British Virgin Islands', 'Brunei', 'Bulgaria', 'Burkina Faso', 'Burundi', 'Cambodia', 'Cameroon', 'Canada', 'Cape Verde', 'Cayman Islands', 'Central African Republic', 'Chad', 'Chile', 'China', 'Colombia', 'Congo', 'Costa Rica', "Cote d'Ivoire", 'Croatia', 'Cuba', 'Curacao', 'Cyprus', 'Czech Republic', 'Democratic Republic of Congo', 'Denmark', 'Djibouti', 'Dominica', 'Dominican Republic', 'Ecuador', 'Egypt', 'El Salvador', 'Equatorial Guinea', 'Eritrea', 'Estonia', 'Ethiopia', 'Faeroe Islands', 'Falkland Islands', 'Fiji', 'Finland', 'France', 'French Polynesia', 'Gabon', 'Gambia', 'Georgia', 'Germany', 'Ghana', 'Gibraltar', 'Greece', 'Greenland', 'Grenada', 'Guam', 'Guatemala', 'Guernsey', 'Guinea', 'Guinea-Bissau', 'Guyana', 'Haiti', 'Honduras', 'Hungary', 'Iceland', 'India', 'Indonesia', 'International', 'Iran', 'Iraq', 'Ireland', 'Isle of Man', 'Israel', 'Italy', 'Jamaica', 'Japan', 'Jersey', 'Jordan', 'Kazakhstan', 'Kenya', 'Kosovo', 'Kuwait', 'Kyrgyzstan', 'Laos', 'Latvia', 'Lebanon', 'Liberia', 'Libya', 'Liechtenstein', 'Lithuania', 'Luxembourg', 'Macedonia', 'Madagascar', 'Malawi', 'Malaysia', 'Maldives', 'Mali', 'Malta', 'Mauritania', 'Mauritius', 'Mexico', 'Moldova', 'Monaco', 'Mongolia', 'Montenegro', 'Montserrat', 'Morocco', 'Mozambique', 'Myanmar', 'Namibia', 'Nepal', 'Netherlands', 'New Caledonia', 'New Zealand', 'Nicaragua', 'Niger', 'Nigeria', 'Northern Mariana Islands', 'Norway', 'Oman', 'Pakistan', 'Palestine', 'Panama', 'Papua New Guinea', 'Paraguay', 'Peru', 'Philippines', 'Poland', 'Portugal', 'Puerto Rico', 'Qatar', 'Romania', 'Russia', 'Rwanda', 'Saint Kitts and Nevis', 'Saint Lucia', 'Saint Vincent and the Grenadines', 'San Marino', 'Saudi Arabia', 'Senegal', 'Serbia', 'Seychelles', 'Sierra Leone', 'Singapore', 'Sint Maarten (Dutch part)', 'Slovakia', 'Slovenia', 'Somalia', 'South Africa', 'South Korea', 'South Sudan', 'Spain', 'Sri Lanka', 'Sudan', 'Suriname', 'Swaziland', 'Sweden', 'Switzerland', 'Syria', 'Taiwan', 'Tanzania', 'Thailand', 'Timor', 'Togo', 'Trinidad and Tobago', 'Tunisia', 'Turkey', 'Turks and Caicos Islands', 'Uganda', 'Ukraine', 'United Arab Emirates', 'United Kingdom', 'United States', 'United States Virgin Islands', 'Uruguay', 'Uzbekistan', 'Vatican', 'Venezuela', 'Vietnam', 'Zambia', 'Zimbabwe']
and latest number of cases of each respective country with:
cases=[]
for item in df:
if item in countries:
# most recent is the last
n = df[item].iloc[-1]
cases.append(n)
ending up with:
cases = [367.0, 383.0, 1468.0, 545.0, 17.0, 3.0, 15.0, 1715.0, 853.0, 74.0, 5956, 12640, 717.0, 36.0, 811.0, 164.0, 63.0, 861.0, 22194, 7.0, 26.0, 39.0, 5.0, 210.0, 2.0, 781.0, 7.0, 13717.0, 3.0, 135.0, 577.0, 384.0, 3.0, 117.0, 685.0, 17883, 7.0, 45.0, 9.0, 9.0, 5116.0, 82784, 1780.0, 45.0, 483.0, 349.0, 1282.0, 396.0, 13.0, 494.0, 5017, 180.0, 5071, 121.0, 15.0, 1956.0, 3995.0, 1322.0, 78.0, 16.0, 31.0, 1149.0, 52.0, 184.0, 5.0, 15.0, 2308.0, 78167, 47.0, 30.0, 4.0, 196.0, 103228, 287.0, 113.0, 1832.0, 11.0, 12.0, 121.0, 80.0, 166.0, 144.0, 33.0, 33.0, 25.0, 312.0, 895.0, 1586, 5194.0, 2738.0, nan, 62589, 1031.0, 5709.0, 150.0, 9248.0, 135586, 63.0, 3906, 170.0, 349.0, 704.0, 172.0, 184.0, 743.0, 270.0, 12.0, 548.0, 548.0, 14.0, 19.0, 78.0, 880.0, 2970.0, 599.0, 85.0, 8.0, 3963.0, 19.0, 56.0, 293.0, 6.0, 268.0, 2785.0, 1056.0, 79.0, 15.0, 241.0, 6.0, 1184.0, 10.0, 22.0, 16.0, 9.0, 19580, 18.0, 969.0, 6.0, 278.0, 254.0, 8.0, 5863, 419.0, 4072.0, 260.0, 2249.0, 2.0, 115.0, 2954.0, 3764.0, 4848.0, 12442.0, 573.0, 2057.0, 4417.0, 7497.0, 105.0, 11.0, 14.0, 8.0, 279.0, 2795.0, 237.0, 2447.0, 11.0, 6.0, 1481, 40.0, 581.0, 1055.0, 8.0, 1749.0, 10384, 1.0, 140510, 185.0, 14.0, 10.0, 10.0, 7693, 22164, 19.0, 376.0, 24.0, 2369.0, 1.0, 65.0, 107.0, 596.0, 34109.0, 8.0, 52.0, 1462.0, 2359.0, 55242, 398809, 45.0, 424.0, 504.0, 7.0, 166.0, 251.0, 39.0, 10.0]
Now I need to plot all of this on a map, and for that I need ISO3
(alpha_3) code for each country. Order of items is crucial here.
Now, I've found this package which provides this info for each country:
import pycountry
and if I print (list(pycountry.countries))
, I get an iterable like this:
[Country(alpha_2='AW', alpha_3='ABW', name='Aruba', numeric='533'), Country(alpha_2='AF', alpha_3='AFG', name='Afghanistan', numeric='004', official_name='Islamic Republic of Afghanistan'), Country(alpha_2='AO', alpha_3='AGO', name='Angola', numeric='024', official_name='Republic of Angola'), Country(alpha_2='AI', alpha_3='AIA', name='Anguilla', numeric='660'), Country(alpha_2='AX', alpha_3='ALA', name='Åland Islands', numeric='248'), Country(alpha_2='AL', alpha_3='ALB', name='Albania', numeric='008', official_name='Republic of Albania'), ...]
QUESTION
How can I search this last iterable and end up with an ordered list of alpha_3
codes, one code for each country in my countries list above (and in the same order), like so:
alpha_codes = ['AFG', 'ALB', ...]
Ps: countries that are not on the list 'countries' should be discarted from the iterable and the length of the three final lists must be the same.