I need to generate a list of populations for different years for each country. The information I need is contained in two dataframes
The first dataframe, gni_per_capita, contains names of countries and years. Countries in this dataframe have a different range of years
The second dataframe, hihd also has names of countries and dates, but the list of countires is more extensive, and there is a wider range of dates for each country. The second dataframe contains the population of each country in each year, the second does not.
I need to generate a list of population for every year of each of the countries in the first dataframe.
I was given the following tip:
1. first, get a unique list of countries from gni_per_capita.
2. Loop through the list, and get the available years for that country.
3. Then .loc index hihd to get the population rows where both the country
and years are correct (hihd.Year.isin(?)).
4. Append these to the list
one by one.
Thus far, I have created a series with country and year from the first dataframe
group = gni_per_capita.groupby('Entity')
ync = group.apply(lambda x: x['Year'].unique())
However, I am struggling to combine the second dataframe with the created series
mask = hihd.Year.isin(ync)