I have a dataframe that looks like this:
df = pd.DataFrame( [
['BILING',2017,7,1406 ],
['BILWPL',2017,7,199],
['BKCLUB',2017,7,9417],
['LEAVEN',2017,7,4773 ],
['MAILORDER',2017,7,10487]
], columns=['Branch','Year','Month','count']
df
Out[1]:
Branch Year Month count
0 BILING 2017 7 1406
1 BILWPL 2017 7 199
2 BKCLUB 2017 7 9417
10 LEAVEN 2017 7 4773
18 MAILORDER 2017 7 10487
It contains the same month but different years so that one can compare the time of year across time.
The desired output would look something like:
Branch Month 2017 2019 Mean(ave) percent_diff
BILING 7 1406 1501 1480 5%
BILWPL 7 199 87 102 -40%
BKCLUB 7 9417 8002 7503 -3%
LEAVEN 7 4773 5009 4509 -15%
MAILORDER 7 10487 11032 9004 8%
My question is how to aggregate based on branch to display across and add 2 columns: mean and percent difference between mean and newest year.
**** UPDATE **** This is close but is missing some columns [ Thanks G. Anderson ]:
df.pivot_table(
values='count', index='Branch', columns='Year',
fill_value=0, aggfunc='mean')
Produces:
Year 2017 2018 2019
Branch
BILING 1406 1280 4
BILWPL 199 117 239
BKCLUB 94 161 238
This is very close but I'm hoping to tack on columns corresponding to the mean, and percent difference.
* UPDATE 2 *
circ_pivot = df.pivot_table(
values='count', index='Branch', columns='Year',
fill_value=0)
circ_pivot['Mean'] = circ_pivot[[2017,2018,2019]].mean(axis=1)
circ_pivot['Change'] = ((circ_pivot[2019] - circ_pivot[2018]) / circ_pivot[2018]) * 100
circ_pivot['Change_mean'] = ((circ_pivot[2019] - circ_pivot['Mean']) / circ_pivot['Mean']) * 100
Output:
Year 2017 2018 2019 Mean Change Change_mean
Branch
BILING 1406 1280 4 896.666667 -99.687500 -99.553903
BILWPL 199 117 239 185.000000 104.273504 29.189189
BKCLUB 94 161 238 164.333333 47.826087 44.827586