So I know how to create a new column based on the difference between consecutive columns, here. But I want to do this for multiple subsets of a dataframe, i.e, for a .groupby('zip')
zip year val
0 48123 2013 10
1 48123 2014 11
2 48123 2015 11
3 60122 2013 13
4 60122 2014 10
5 60122 2015 10
Would produce
zip year val dVal
0 48123 2013 10 1
1 48123 2014 11 0
2 48123 2015 11 NaN
3 60122 2013 13 -3
4 60122 2014 10 1
5 60122 2015 11 NaN
Right now I'm looping over my dataframe using
for index, group in df.groupby('zip'):
group.loc['dVal'] = group['val'].shift(-1) - group['val']
But I have over a million groups, so it's taking a while, is there a better way to be doing this?