I have a database and I want to create a column has the sum of that row and the next 3 rows of a given column. I have manage to accomplish this result using interrows(), however, I know this is not the ideal way to do this. I have tried using apply() and lambda functions in multiple ways, but I could not make it work.
Here is the code that I wrote, which gets the desired result:
import pandas as pd
mpg_df = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/mpg.csv')
sum_results = []
for index, row in mpg_df.iterrows():
inital_range = index
final_range = index+4 if index+4 <= mpg_df.shape[0] else mpg_df.shape[0]
sum_result = mpg_df['mpg'].iloc[inital_range:final_range].sum()
sum_results.append(sum_result)
mpg_df["special_sum"] = sum_results
mpg_df
My question is, how can I get the same result, there is the "special_sum" column, without using interrows()?
Edit: Personally, I do not have anything against interrrows(), however, I am trying to learn Pandas and best practices and according to this answer (How to iterate over rows in a DataFrame in Pandas), I should not be using interrows(), they are quite explicit about that. I do not want to create a debate, I just want to know if there is a better way to accomplish the same task.