I am somewhat new to pandas and feel like there should be a more effective way to get the difference of the min year and max year, so change, for each country without iterating over each country like I am doing. I would like to vectorize the code. Maybe it's just the way the dataset is organized but I have been struggling to find a vectorized solution.
Does anyone have an efficient idea of how to run this without iterating over countries like I am doing? I feel like there should be a way to do this. I added a sample of the dataset below my code sample.
new_columns = ['CountryName', 'Forest Area Change' ]
dff = pd.DataFrame(columns=new_columns)
for country in countries:
forest_area_1990 = df[(df.CountryName == country) & (df.IndicatorCode == 'AG.LND.FRST.ZS') & (df.Year == 1990)].Value.values
forest_area_2015 = df[(df.CountryName == country) & (df.IndicatorCode == 'AG.LND.FRST.ZS') & (df.Year == 2015)].Value.values
if forest_area_1990.size > 0 and forest_area_2015.size > 0:
dff = dff.append({new_columns[0]:country, new_columns[1]: forest_area_2015[0] - forest_area_1990[0]}, ignore_index=True)
The dataset looks like the following:
CountryName CountryCode IndicatorName IndicatorCode Year Value
11531340 Canada CAN Forest area (% of land area) AG.LND.FRST.ZS 1990 38.299073
21041940 Canada CAN Forest area (% of land area) AG.LND.FRST.ZS 2015 38.166671
11777751 United States USA Forest area (% of land area) AG.LND.FRST.ZS 1990 33.022308
21288351 United States USA Forest area (% of land area) AG.LND.FRST.ZS 2015 33.899723