Create a column of differences in 'col2' for each item in 'col1'

Question

I have the following dataframe:

df=pd.DataFrame({
    'col1' : ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'D'],
    'col2' : [9.6,10.4, 11.2, 3.3, 6, 4, 1.94, 15.44, 6.17, 8.16]
})

It has the display :

col1    col2
0   A   9.60
1   A   10.40
2   A   11.20
3   B   3.30
4   B   6.00
5   B   4.00
6   C   1.94
7   C   15.44
8   C   6.17
9   D   8.16

I want to get the following output:

col1    col2    Diff
0   A   9.60    0
1   A   10.40   0.80
2   A   11.20   0.80
3   B   3.30    0
4   B   6.00    2.70
5   B   4.00    -2.00
6   C   1.94    0
7   C   15.44   13.50
8   C   6.17    -9.27
9   D   8.16    0

I tried to use diff() but it calculate differences for all values in col2 however I want to do that for each item in col1.

So far I tried df['col2'].diff() but not worked,

Any help from your side will be highly appreciated, thanks.

Have you tried to group by col1? https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.groupby.DataFrameGroupBy.diff.html — TiTo, Dec 01 '22 at 14:42
@TiTo I don't think that groupby works, since it doesn't provide `difference` option — Khaled DELLAL, Dec 01 '22 at 14:43

score 2 · Answer 1 · answered Dec 01 '22 at 14:44

2

You need a groupby, this works i think :

df.insert(2, 'Diff', (df.groupby('col1')['col2'].diff()))

result :

    col1    col2    Diff
0   A   9.60    NaN
1   A   10.40   0.80
2   A   11.20   0.80
3   B   3.30    NaN
4   B   6.00    2.70
5   B   4.00    -2.00
6   C   1.94    NaN
7   C   15.44   13.50
8   C   6.17    -9.27
9   D   8.16    NaN

(you can replace the NaN by 0 if you wish)

answered Dec 01 '22 at 14:44

grymlin

492
1
9

1

Cleaner than my suggestion, nice! – ImotVoksim Dec 01 '22 at 14:56

score 1 · Answer 2 · answered Dec 01 '22 at 14:44

1

How about this:

differences = []

for val in df.col1.unique():
    diffs = df.loc[df.col1 == val].col2.diff()
    differences.extend(diffs)

Then you can add the differences list as a new column.

answered Dec 01 '22 at 14:44

ImotVoksim

77
3

score 1 · Accepted Answer · answered Dec 01 '22 at 14:45

You can use groupby() and diff()and assign the result to the new column Diff. Than you only need to fillna(0):

df['Diff'] = df.groupby('col1')['col2'].diff().fillna(0)

That should solve your problem:

    col1    col2    Diff
0   A   9.60    0.00
1   A   10.40   0.80
2   A   11.20   0.80
3   B   3.30    0.00
4   B   6.00    2.70
5   B   4.00    -2.00
6   C   1.94    0.00
7   C   15.44   13.50
8   C   6.17    -9.27
9   D   8.16    0.00

Create a column of differences in 'col2' for each item in 'col1'

3 Answers3