0

I want to compare the values of the current row and the next row using iterrows() for a Pandas df.

The code is:

yc_df = pd.DataFrame([[7,1,1990,0], [7.5,1,1990,0], [8,2,1990,0],[8.5,2,1990,0]], columns=['3_Mo', 'Month','Year', 'Avg_3_Mo'])

#initially, yc_df['Avg_3_Mo'] has been initialised to all 0s.

for index, row in yc_df.iterrows():
  if math.isnan(row['3_Mo']) == False:
    count += 1
    sum_3mo += row['3_Mo']
    avg_3mo = sum_3mo/count 

if row['Month']!= (row+1)['Month']: **#Here I want to compare the current row value with the next row**
  row['Avg_3_Mo'] = avg_3mo
  sum_3mo = 0 
  avg_3mo = 0

Not sure how to make this work.

Expected output:

df
   3_Mo  Month  Year  Avg_3_Mo
0  7      1     1990     0
1  7.5    1     1990     7.25
2  8      2     1990     0
3  8.5    2     1990     8.25

The expected output should have the 'Avg 3 Mo' column to be updated with the average value for '3 Mo' for each Month and Year (like a GROUP BY 'Month', 'Year'). So I want 'Avg 3 Mo' to have zeroes for all entries except when entries where the value of 'Month' changes from one row to the next. For these rows wherein the 'Month' value changes, I want to input the average month value.

Ansh Gupta
  • 81
  • 3
  • 7
  • Can you provide a minimal example DataFrame as input and the matching expected output? – mozway Jul 19 '22 at 12:44
  • I apologise, I have updated the changes in the Question above. – Ansh Gupta Jul 19 '22 at 13:44
  • This is not a minimal reproducible example. Please read [this](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – mozway Jul 19 '22 at 13:45
  • **DO NOT** post images of code, links to code, data, error messages, etc. - copy or type the text into the question. – itprorh66 Jul 19 '22 at 13:58
  • Thank you for your valuable comments, I have made the required changes to the question. – Ansh Gupta Jul 19 '22 at 14:04

1 Answers1

0

IIUC, use a groupby average, then mask the first row of each group:

g = yc_df.groupby(['Month', 'Year'])

yc_df['Avg_3_Mo'] = g['3_Mo'].transform('mean').mask(g.cumcount().eq(0), 0)

output:

   3_Mo  Month  Year  Avg_3_Mo
0   7.0      1  1990      0.00
1   7.5      1  1990      7.25
2   8.0      2  1990      0.00
3   8.5      2  1990      8.25
mozway
  • 194,879
  • 13
  • 39
  • 75