Update pandas dataframe current row attribute based on its value in the previous row for each row

Question

I need to perform the following steps on a data-frame:

Assign a starting value to the "balance" attribute of the first row.
Calculate the "balance" values for the subsequent rows based on value of the previous row using the formula for eg : (previous row balance + 1)

I have tried the following steps:

Created the data-frame:

df = pd.DataFrame(pd.date_range(start = '2019-01-01', end = '2019-12-31'),columns = ['dt_id'])

Created attribute called 'balance':

df["balance"] = 0

Tried to conditionally update the data-frame:

df["balance"] = np.where(df.index == 0, 100, df["balance"].shift(1) + 1)

Results:

From what I can observe, the value is being retrieved for subsequent update before it can be updated in the original data-frame.

The desired output for "balance" attribute :

Row 0 : 100
Row 1: 101
Row 2 : 102

And so on

score 2 · Answer 1 · answered Sep 12 '19 at 07:37

2

If I understand correctly if you add this line of code after yours, you are ready:

df["balance"].cumsum()

0      100.0
1      101.0
2      102.0
3      103.0
4      104.0
       ...  
360    460.0
361    461.0
362    462.0
363    463.0
364    464.0

It is a cumulative sum, it sums its value with the previous one and since you have the starting value and then ones it will do what you want.

answered Sep 12 '19 at 07:37

Billy Bonaros

1,671
11
18

1

Yes that is what I am looking for ,thank you.i will give it a go and mark as solved. – Abhi Sep 12 '19 at 07:41
1

Just our of curiosity, in a scenario where I need to use the previous row's balance on a formula to calculate the present row's balance, how would I go about doing that ? – Abhi Sep 12 '19 at 07:51
You'll do the same because you only need the starting value. Every value is the previews +1 so you only need a starting value. Am I right? – Billy Bonaros Sep 12 '19 at 07:56

AnsFourtyTwo · Accepted Answer · 2019-09-12T08:17:12.180

The problem you have is, that you want to calculate an array and the elements are dependent on each other. So, e.g., element 2 depends on elemen 1 in your array. Element 3 depends on element 2, and so on.

If there is a simple solution, depends on the formula you use, i.e., if you can vectorize it. Here is a good explanation on that topic: Is it possible to vectorize recursive calculation of a NumPy array where each element depends on the previous one?

In your case a simple loop should do it:

balance = np.empty(len(df.index))
balance[0] = 100
for i in range(1, len(df.index)):
  balance[i] = balance[i-1] + 1  # or whatever formula you want to use

Please note, that above is the general solution. Your formula can be vectorized, thus also be generated using:

balance = 100 + np.arange(0, len(df.index))

Update pandas dataframe current row attribute based on its value in the previous row for each row

2 Answers2