0

I have a dataframe looks like:

user_id date balance
A 1 100
A 2 200
A 3 250
B 2 300
B 3 200

I would like to create a new column which is the difference of balance today and previous day for each user.

user_id date balance balance_previous_day
A 1 100 NaN
A 2 200 100
A 3 250 50
B 2 300 NaN
B 3 200 -100

I tried:

df.groupby (['user_id']).apply (lambda x: np.diff (x.balance, 1))

And the result will be:

user_id
A [100, 50]
B [-100]

The result is correct but I don't know how to put back the result to the dataframe.

mommomonthewind
  • 4,390
  • 11
  • 46
  • 74

1 Answers1

2

To get the previous day value, you need shift, not diff. GroupBy does the rest:

>>> df['balance_previous_day'] = df.groupby('user_id')['balance'].shift()
>>> df
  user_id  date  balance  balance_previous_day
0       A     1      100                   NaN
1       A     2      200                 100.0
2       A     3      250                 200.0
3       B     2      300                   NaN
4       B     3      200                 300.0

Note that this assumes your days are sorted for each user.

To get the difference:

>>> df['balance_change'] = df.groupby('user_id')['balance'].diff()
>>> df
  user_id  date  balance  balance_previous_day  balance_change
0       A     1      100                   NaN             NaN
1       A     2      200                 100.0           100.0
2       A     3      250                 200.0            50.0
3       B     2      300                   NaN             NaN
4       B     3      200                 300.0          -100.0
Cimbali
  • 11,012
  • 1
  • 39
  • 68