New row which is the difference of two rows using Groupby in a pandas dataframe

Question

I have a dataframe

import pandas as pd
df = pd.DataFrame({ 
'ID': [1, 1, 1, 2, 2, 2], 
'value': [100, 120, 130, 200, 190, 210],
'value2': [2100, 2120, 2130, 2200, 2190, 2210],   
'state': ['init','mid', 'final', 'init', 'mid', 'final'], 
})

I want to add another row, which is the difference of 'final' and 'init' in the state column for each ID.

The result should look like

import pandas as pd
df = pd.DataFrame({ 
    'ID': [1, 1, 1, 2, 2, 2, 1, 2], 
    'value': [100, 120, 130, 200, 190, 210, 30, 10],
    'value2': [2100, 2120, 2130, 2200, 2190, 2210, 100, 10],   
     'state': ['init','mid', 'final', 'init', 'mid', 'final', 'diff', 'diff'], 
 })

I have tried the following. But I get AttributeError: 'function' object has no attribute 'groupby' error

df1 = df.copy()

df1 = df[df.state.isin(['final', 'init'])].copy
s = df1.groupby('ID', sort=False).diff().dropna.assign(id=df1['ID'].unique(), state='diff')
df = df.append(s, sort=True).sort_values("ID")

tdy · Accepted Answer · 2021-03-06T23:31:53.643

What about something like this? Group by ID, keep only the value* cols, take the diff, and reassign ID and state:

df

	ID	value	value2	state
0	1	100	2100	init
1	1	120	2120	mid
2	1	130	2130	final
3	2	200	2200	init
4	2	190	2190	mid
5	2	210	2210	final

diffs = (df[df.state.isin(['init', 'final'])]
    .groupby('ID')[['value', 'value2']]
    .diff().dropna()
    .assign(ID=df.ID.unique(), state='diff')
)

	ID	value	value2	state
0	1	30.0	30.0	diff
1	2	10.0	10.0	diff

Then combine back into df:

df = df.append(diffs)

	ID	value	value2	state
0	1	100.0	2100.0	init
1	1	120.0	2120.0	mid
2	1	130.0	2130.0	final
3	2	200.0	2200.0	init
4	2	190.0	2190.0	mid
5	2	210.0	2210.0	final
0	1	30.0	30.0	diff
1	2	10.0	10.0	diff

score 1 · Answer 2 · answered Mar 06 '21 at 22:52

1

Another option is to take the difference betweent the final state and the init as a matrix operation and then fill the missing columns ID and state passing a dictionary of values.

val_cols = ['value', 'value2']
df_final = df.query('state == "final"')[val_cols].reset_index(drop=True) - df.query('state== "init"')[val_cols].reset_index(drop=True)

df_final = pd.concat([df, df_final])

values = {'ID': {i:i+1 for i, v in enumerate(df.ID.unique())}, 'state': 'diff'}
df_final.fillna(values, inplace=True)

Final Result

answered Mar 06 '21 at 22:52

Miguel Trejo

5,913
5
24
49

This solution works well in this example. But if the IDs were not non numeric or were not sequentially arranged, values = {'ID': {i:i+1 for i, v in enumerate(df.ID.unique())}, 'state': 'diff'} does not give the right output for ID / group. How might one fix that? – postcolonialist Mar 14 '21 at 19:49
1

yes @postcolonialist, in that case sorting `df_final` by id and state before applying the operations should work – Miguel Trejo Mar 16 '21 at 14:19

New row which is the difference of two rows using Groupby in a pandas dataframe

2 Answers2