1

I have the following dataframe:

df=pd.DataFrame({
    'col1' : ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'D'],
    'col2' : [9.6,10.4, 11.2, 3.3, 6, 4, 1.94, 15.44, 6.17, 8.16]
})

It has the display :

col1    col2
0   A   9.60
1   A   10.40
2   A   11.20
3   B   3.30
4   B   6.00
5   B   4.00
6   C   1.94
7   C   15.44
8   C   6.17
9   D   8.16

I want to get the following output:

col1    col2    Diff
0   A   9.60    0
1   A   10.40   0.80
2   A   11.20   0.80
3   B   3.30    0
4   B   6.00    2.70
5   B   4.00    -2.00
6   C   1.94    0
7   C   15.44   13.50
8   C   6.17    -9.27
9   D   8.16    0

I tried to use diff() but it calculate differences for all values in col2 however I want to do that for each item in col1.

So far I tried df['col2'].diff() but not worked,

Any help from your side will be highly appreciated, thanks.

Khaled DELLAL
  • 871
  • 4
  • 16

3 Answers3

2

You need a groupby, this works i think :

df.insert(2, 'Diff', (df.groupby('col1')['col2'].diff()))

result :

    col1    col2    Diff
0   A   9.60    NaN
1   A   10.40   0.80
2   A   11.20   0.80
3   B   3.30    NaN
4   B   6.00    2.70
5   B   4.00    -2.00
6   C   1.94    NaN
7   C   15.44   13.50
8   C   6.17    -9.27
9   D   8.16    NaN

(you can replace the NaN by 0 if you wish)

grymlin
  • 492
  • 1
  • 9
1

How about this:

differences = []

for val in df.col1.unique():
    diffs = df.loc[df.col1 == val].col2.diff()
    differences.extend(diffs)

Then you can add the differences list as a new column.

ImotVoksim
  • 77
  • 3
1

You can use groupby() and diff()and assign the result to the new column Diff. Than you only need to fillna(0):

df['Diff'] = df.groupby('col1')['col2'].diff().fillna(0)

That should solve your problem:

    col1    col2    Diff
0   A   9.60    0.00
1   A   10.40   0.80
2   A   11.20   0.80
3   B   3.30    0.00
4   B   6.00    2.70
5   B   4.00    -2.00
6   C   1.94    0.00
7   C   15.44   13.50
8   C   6.17    -9.27
9   D   8.16    0.00
TiTo
  • 833
  • 2
  • 7
  • 28