I have a pandas dataframe;
ID | MONTH | TOTAL | |
---|---|---|---|
0 | REF1 | 1 | 500 |
1 | REF1 | 2 | 501 |
2 | REF1 | 3 | 620 |
3 | REF2 | 8 | 5001 |
4 | REF2 | 9 | 5101 |
5 | REF2 | 10 | 5701 |
6 | REF2 | 11 | 7501 |
7 | REF2 | 7 | 6501 |
8 | REF2 | 6 | 1501 |
I need to do a comparison between of difference between the ID's previous month's TOTAL.
At the moment I can calculate the difference between the row above but the comparison doesn't take into account the ID/MONTH. Would this need to be a where loop?
I have tried the below, but this returns NaN in all cells of the 'Variance' & 'Variance%' columns;
df_all.sort_values(['ID', 'MONTH'], inplace=True)
df_all['Variance'] = df_all['TOTAL'] - df_all.groupby(['ID', 'MONTH'])['TOTAL'].shift()
df_all['Variance%'] = df_all['TOTAL'] - df_all.groupby(['ID', 'MONTH'])['TOTAL'].pct_change()
The desired outcome is;
ID | MONTH | TOTAL | Variance | Variance % | |
---|---|---|---|---|---|
0 | REF1 | 1 | 500 | 0 | 0 |
1 | REF1 | 2 | 501 | 1 | 0.2 |