Calculating index values based on previous calculations using the same calculation

Question

I have a df:

pd.DataFrame(index = ['A','B','C'],
             columns = ['1','2','3','4'], 
             data = [[100,60,40,60],
                     [200,10,50,80],
                     [50, np.nan, np.nan, np.nan]])

        1           2           3           4        
A       100         60          40          60
B       200         10          50          80
C       50

I would like to calculate the remaining C index values, but each calculation is dependent on the previous value like so:

        1           2           3           4        
A       100         60          40          60
B       200         10          50          80
C       50          A2+B2-C1    A3+B3-C2    A4+B4-C3

I checked this answer and tried the following:

new = [df.loc['C'].values]

for i in range(1, len(df.index)):
    new.append(new[i-1]*df.loc['A'].values[i]+df.loc['B'].values[i]-df.loc['C'].values[i-1])
df.loc['C'] = new

But I get :

ValueError: cannot set a row with mismatched columns

Also, the question and answers are quite outdated, maybe there is a new solution for these recursive functions inside pandas dataframe?

can you share a `df.to_dict()` ? So we can understand the exact structure of the Datafrme — azro, Jan 11 '22 at 11:41
Added `df` creation. `[i-1]` to use the previous column for `C`? — Jonas Palačionis, Jan 11 '22 at 11:48
Perfromance is important? then use `numba`. If not, use loop solutions similar like accepted linked solution. — jezrael, Jan 11 '22 at 11:48
Performance is not important, I tried using the loop solution in the accepted answer but did not work as expected because of me having indexes while the answer uses column structure. — Jonas Palačionis, Jan 11 '22 at 11:50
I don't see any multiplication in the schema above, so why do I see one in the code ? — azro, Jan 11 '22 at 11:55

score 2 · Answer 1 · answered Jan 11 '22 at 11:54

Key is : print your variables to ensure they contains what you think

First is that new = [df.loc['C'].values] builds a list with one item being an array, you just want one list
Then if the loop you're using new[i-1] *, which isn't present in the schema above
you use df.loc['C'].values[i-1] but you don't update it (you save in a list) so you can't expect it to work
- directly update the DF and use - df.loc['C'].values[i-1]
- keep the temporaty list and use - new[i - 1]
you don't want to append, but overwrite the values (or you'd have need to start new with only one value

With a separate list

new = df.loc['C'].to_list()

for i in range(1, len(df.columns)):
    new[i] = df.loc['A'].values[i] + df.loc['B'].values[i] - new[i - 1]

Without a separate list

for i in range(1, len(df.columns)):
    df.iloc[2, i] = df.iloc[1, i] + df.iloc[0, i] - df.iloc[2, i - 1]

What happens if instead of columns being `1,2...` I have `2021-11-29, 2021-12-06 ...`? — Jonas Palačionis, Jan 11 '22 at 12:40
@JonasPalačionis the code doesn't use the column name at all, it shouldn't cause trouble. Also can't you just try and answer that question by yourself ? ;) — azro, Jan 11 '22 at 13:46

Calculating index values based on previous calculations using the same calculation

1 Answers1