Generate new column in Pandas from existing column divided to groups

Question

From a pandas dataframe like this:

                    data
cat     order
a       1            10
a       2            17
a       3            11
a       4            14
a       5            17
b       1            20
b       2            22
b       3            26
b       4            23
b       5            24

I want to create a new column with the equation new_data[n]=data[n]-data[n-1]. But there are several categories 'cat' ordered using 'order' and every category's first row should start with 0.

The end dataframe should look like this:

                    data  new_data
cat     order
a       1            10      0
a       2            17      7
a       3            11     -6
a       4            14      3
a       5            17      4
b       1            20      0
b       2            22      2
b       3            26      4
b       4            23     -3
b       5            24      1

I cant find a way to do it. Any help will be welcomed.

If `cat` is your index: `df.groupby(level=0).diff().fillna(0)`, if it is a column: `df.groupby('cat').diff().fillna(0)` — Erfan, Apr 09 '20 at 17:20

score 0 · Answer 1 · answered Apr 09 '20 at 17:20

You can groupby and then use transform:

df['new_data'] = df.groupby('cat')['data'].transform(lambda x: x.rolling(2).apply(lambda x: x.iloc[1]-x.iloc[0])).fillna(0)

df
cat  order  data  new_data
0   a      1    10       0.0
1   a      2    17       7.0
2   a      3    11      -6.0
3   a      4    14       3.0
4   a      5    17       3.0
5   b      1    20       0.0
6   b      2    22       2.0
7   b      3    26       4.0
8   b      4    23      -3.0
9   b      5    24       1.0

Generate new column in Pandas from existing column divided to groups

1 Answers1