I have the following df
:
df = pd.DataFrame({'block': [1, 1, 1, 2, 2, 2, 3, 3, 3],
'B': ['a', 'b', 'c', 'a', 'b', 'c', 'a', 'b', 'c'],
'C': [1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000],
})
Which outputs:
block B C
0 1 a 1000
1 1 b 2000
2 1 c 3000
3 2 a 4000
4 2 b 5000
5 2 c 6000
6 3 a 7000
7 3 b 8000
8 3 c 9000
I want to output a diff_column
(could also be an entirely new_df
) who's values are the difference between the values in the C
column grouped by the B
column for every different value in the block
column.
That is, what's the difference between a
's value in block
2 and a
's value in block
1?
Example for a
:
4000 - 1000 = 3000
Note, the rows for block
== 1 would be empty since there is no other previous block
.
I have tried:
df['diff_column'] = df.groupby(['block', 'B'])['C'] - df.shift[-1].groupby(['block', 'B'])['C']
with no success.