Pandas Dataframe iterating over rows

Question

Here is the dataset

import pandas as pd
d = {'Key':['A','A','A','A'],'Rank':[1,2,3,4],'col1': [15000,12000,6000,7000], 'col2': [15000,10000,0,0],'col4': [10000,10000,10000,10000],'col5': [0,0,0,0] }
df = pd.DataFrame(data=d)
df

Col1= Max values it can take
Col2=Current value it holds
Col4:Remaining value that should fit in any of these records.

I am trying to fill in the 'col5' with possible max value that it can take.Where 'Col1' defines its maximum limit and 'col2' shows its current value. If it fits max value then move to the next row. The value that it can fit is determined by 'col4'. Please see below example.

Example:

first record with rank 1 Col1=15000 and col2=15000 then move to next row.
second record with rank2 col1=12000 and col2=10000. Here we can see that its max is 12000 so I can add 2000 more, also need to make sure col5>2000 so col5=2000 and col4 for next record will be 10000-2000=8000

Here is the end dataset which should look like

Below is the code which I have tried

for index, row in df.iterrows():
    #print(row['col1'], row['col2'])
    if row['col1']>row['col2']:
        
        if (row['col1']-row['col2'])<row['col2']:
            row['col5']=row['col1']-row['col2']
        else:
            row['col5']=row['col2']
    #return
    print(row['col1'], row['col2'],row['col5'])

D-E-N · Accepted Answer · 2020-09-28T16:47:39.577

this should do your stuff (Updated with multiple keys):

import pandas as pd

d = {'Key': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'], 'Rank': [1, 2, 3, 4, 1, 2, 3, 4],
 'col1': [15000, 12000, 6000, 7000, 15000, 12000, 6000, 7000], 'col2': [15000, 10000, 0, 0, 15000, 10000, 0, 0],
 'col4': [10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000], 'col5': [0, 0, 0, 0, 0, 0, 0, 0]}
df = pd.DataFrame(data=d)
print(df)

df_result = pd.DataFrame()

for group in df.groupby(df.Key):
    tmp_value = 0
    df_tmp = group[1]
    for index, row in df_tmp.iterrows():
        if tmp_value == 0:
            tmp_value = row['col4']
        # print(row['col1'], row['col2'])
        if row['col1'] > row['col2']:
            diff_value = row['col1'] - row['col2']
            if diff_value < tmp_value:
                df_tmp.at[index, 'col5'] = row['col1'] - row['col2']
                tmp_value = tmp_value - diff_value
            else:
                df_tmp.at[index, 'col5'] = tmp_value
                break
    df_result = df_result.append(df_tmp)

print(df_result)

A few hints:
The tmp_value holds the data from col 4 to decrease over time.
you should break with break, not with exit in my mind
Here you can read about editing panda rows during iterating over it: Update a dataframe in pandas while iterating row by row.
edit: You also can get the key data first and save the 'col4'-data in an array and change the original dataframe directly, but thats up to you

Thanks for taking time. This works great, but only works when there is only one key(A). if there are multiple(A,B,C) this logic isnt working for B and C key records. — HKE, Sep 25 '20 at 18:28
Ok, please give an example for data with another key (your example has only one key), if there are multiple keys, you only have to add a check the right key, i think, but i may not imagine correctly where the second key comes into account. can i only push the value to the rows with the same key? and Is it allowd to change the values in col4? — D-E-N, Sep 25 '20 at 19:10
same data with one more key d = {'Key': ['A', 'A', 'A', 'A','B','B','B','B'], 'Rank': [1, 2, 3, 4,1, 2, 3, 4], 'col1': [15000, 12000, 6000, 7000,15000, 12000, 6000, 7000], 'col2': [15000, 10000, 0, 0,15000, 10000, 0, 0], 'col4': [10000, 10000, 10000, 10000,10000, 10000, 10000, 10000], 'col5': [0, 0, 0, 0,0, 0, 0, 0]} — HKE, Sep 25 '20 at 19:17
just a minor correction. Temp_value should be after groupby loop which I have edited. Thanks for your help D-E-N. This is a great help — HKE, Sep 28 '20 at 11:52

Pandas Dataframe iterating over rows

1 Answers1