0

I have a dataframe which I want to group based on the name. Once grouped, I want to go through each row of each group and update the values ​​of a column to then do other operations.

The problem is that when I update a row, the value of the row is updated in the dataframe, but the row object is still not updated.

For example, in this case the value of df_group.Age outputs 25 which is the updated value but the value of row.Age outputs the value 20 which is the value not updated. How can I make the row.Age value update in that same iteration so that I can continue using the updated row.Age value?

import pandas as pd

data = {'Name': ['A', 'B', 'C', 'D', 'A', 'B', 'D'],
        'Age': [20, 21, 19, 18, 21, 19, 18],
        'Size': [7, 7, 9, 8, 7, 9, 8]}
df = pd.DataFrame(data).sort_values(by='Name').reset_index(drop=True)

df['New_age'] = 0

df_grouped = df.groupby(['Name'])

for group_name, df_group in df_grouped:
    for row in df_group.itertuples():
        if row.Age == 20:
            df_group.at[row.Index, 'Age'] = 25
        print(df_group.Age)
        print(row.Age)

        #Do things with the row.Age value = 25
JDK
  • 217
  • 1
  • 10
  • 1
    But what do you want to do? Are you sure you need `itertuples`? Best would be to try to vectorize – mozway Feb 27 '23 at 08:35
  • When you modify a row using `df_group.at[row.Index, 'Age'] = 25` , you are actually modifying the value in the original `DataFrame` . – EL Amine Bechorfa Feb 27 '23 at 08:40

2 Answers2

2

row.Age value is not updated in the itertuples loop is because the row object is a named tuple and it is immutable. To achieve what you want is to use the df.loc accessor to update the value in the DataFrame and then retrieve the updated value from the DataFrame:

for group_name, df_group in df_grouped:
    for row in df_group.itertuples():
        if row.Age == 20:
            df.loc[row.Index, 'Age'] = 25
            row = row._replace(Age=25)  # update the named tuple
        print(df_group.Age)
        print(row.Age)
1

Do you need update original DataFrame ? Then instead df_group use df.

df.at[row.Index, 'Age'] = 25

I suggest avoid looping in pandas, best is vectorize if possible like here.

If need looping in groups and processing values per groups use custom function:

def f(x):
    print (x)
    #processing
    #x.loc[x.Age == 20, 'Age'] = 25
    #x['new'] = 'ouutput of processing'
    return x

df1 = df.groupby(['Name']).apply(f)
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Yes, The problem is that i need to update the new DataFrame with the .at comand works perfect, but i need also keep making operations with the row.Age value that still have the old value. It seems that the value of the row cannot be updated until the iteration is over, but it does update the value in the DataFrame itself. – JDK Feb 27 '23 at 08:38
  • @JDK - hmmm, is looping by `itertuples` necessary? – jezrael Feb 27 '23 at 08:39
  • Maybe it is not necessary but how can i loop row by row each group if it not with itertuples?? – JDK Feb 27 '23 at 08:41
  • @JDK - Added to answer - you can call function. – jezrael Feb 27 '23 at 08:44
  • 1
    Thanks for the modifications, this should be enought for me! – JDK Feb 27 '23 at 08:44