I have a dataframe of population data that looks like this, where every row is the population of a county taken at a specific date
County Date Population
Alba 1900-1-1 1094
Alba 1900-2-1 1107
Alba 1900-3-1 1120
Belfor 1900-1-1 9756
Belfor 1900-3-1 9976
...
I want to get a dataframe where the index is the year, the columns are the county names, and the values are the average population in a given year.
This is what my code looks like currently
#year
df['Year'] = pd.DatetimeIndex(df['date']).year.fillna(0).astype(int)
##create aggregation on year, county of population
new_df = df.groupby(['Year','County']).mean()
#turn county into column
new_df.reset_index(level=1, inplace=True)
##pivot dataframe
new_df.pivot(columns='Country',values='Population')
This has proven extraordinarily slow since there are several entries per year for hundreds of years. What can I do instead to make this run faster