0

From a pandas dataframe like this:

                    data
cat     order
a       1            10
a       2            17
a       3            11
a       4            14
a       5            17
b       1            20
b       2            22
b       3            26
b       4            23
b       5            24

I want to create a new column with the equation new_data[n]=data[n]-data[n-1]. But there are several categories 'cat' ordered using 'order' and every category's first row should start with 0.

The end dataframe should look like this:

                    data  new_data
cat     order
a       1            10      0
a       2            17      7
a       3            11     -6
a       4            14      3
a       5            17      4
b       1            20      0
b       2            22      2
b       3            26      4
b       4            23     -3
b       5            24      1

I cant find a way to do it. Any help will be welcomed.

Ilya
  • 730
  • 4
  • 16

1 Answers1

0

You can groupby and then use transform:

df['new_data'] = df.groupby('cat')['data'].transform(lambda x: x.rolling(2).apply(lambda x: x.iloc[1]-x.iloc[0])).fillna(0)

df
cat  order  data  new_data
0   a      1    10       0.0
1   a      2    17       7.0
2   a      3    11      -6.0
3   a      4    14       3.0
4   a      5    17       3.0
5   b      1    20       0.0
6   b      2    22       2.0
7   b      3    26       4.0
8   b      4    23      -3.0
9   b      5    24       1.0

Bruno Mello
  • 4,448
  • 1
  • 9
  • 39