I'm looking to do calculations in a local range for each row in a dataframe while avoiding a slow for
loop. For example, for each row in the data below I want to find the maximum temperature within the next 3 days (including current day) and the total amount of rain within the next 3 days:
Day Temperature Rain
0 30 4
1 31 14
2 31 0
3 30 0
4 33 5
5 34 0
6 32 0
7 33 2
8 31 5
9 29 9
The ideal output would then be the new columns as in the table below. TempMax of Day 0 shows the highest temperature between Day 0 and Day 2, RainTotal shows the sum of rain between Day 0 and Day 2:
Day Temperature Rain TempMax RainTotal
0 30 4 31 18
1 31 14 31 14
2 31 0 33 5
3 30 0 34 5
4 33 5 34 5
5 34 0 34 2
6 32 0 33 7
7 33 2 33 16
8 31 5 31 14
9 29 9 29 9
Currently I'm using a for
loop:
# Make empty arrays to store each row's max & sum values
temp_max = np.zeros(len(df))
rain_total = np.zeros(len(df))
# Loop through the df and do operations in the local range [i:i+2]
for i in range(len(df)):
temp_max[i] = df['Temperature'].iloc[i:i+2].max()
rain_total = df['Rain'].iloc[i:i+2].sum()
# Insert the arrays to df
df['TempMax'] = temp_max
df['RainTotal'] = rain_total
The for
loop gets the job done but takes 50 minutes with my dataframe. Any chance this can be vecrotized or made faster some other way?
Thanks a bunch!