1

I have a database and I want to create a column has the sum of that row and the next 3 rows of a given column. I have manage to accomplish this result using interrows(), however, I know this is not the ideal way to do this. I have tried using apply() and lambda functions in multiple ways, but I could not make it work.

Here is the code that I wrote, which gets the desired result:

import pandas as pd
mpg_df = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/mpg.csv')

sum_results = []

for index, row in mpg_df.iterrows():
    inital_range = index
    final_range = index+4 if index+4 <= mpg_df.shape[0] else mpg_df.shape[0]

    sum_result = mpg_df['mpg'].iloc[inital_range:final_range].sum()
    sum_results.append(sum_result)

mpg_df["special_sum"] = sum_results
mpg_df

My question is, how can I get the same result, there is the "special_sum" column, without using interrows()?

Edit: Personally, I do not have anything against interrrows(), however, I am trying to learn Pandas and best practices and according to this answer (How to iterate over rows in a DataFrame in Pandas), I should not be using interrows(), they are quite explicit about that. I do not want to create a debate, I just want to know if there is a better way to accomplish the same task.

Nick ODell
  • 15,465
  • 3
  • 32
  • 66
user3347814
  • 1,138
  • 9
  • 28
  • 50
  • 1
    What is wrong with `iterrows`? – Scott Hunter Jul 20 '23 at 17:38
  • Personally, I do not have anything against this, however, I am trying to learn Pandas and best practices and according to this answer (https://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas), I should not be using interrows(), they are quite explicit about that. – user3347814 Jul 20 '23 at 17:41
  • 3
    Probably you want https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.rolling.html – Michael Butscher Jul 20 '23 at 17:46

2 Answers2

2

You can try use to use custom indexer in .rolling() method:

indexer = pd.api.indexers.FixedForwardWindowIndexer(window_size=4)
mpg_df["special_sum_2"] = mpg_df.rolling(indexer, min_periods=1)['mpg'].sum()

# print head and tail of final dataframe
print(mpg_df.head(10))
print()
print(mpg_df.tail(10))

Prints:

    mpg  cylinders  displacement  horsepower  weight  acceleration  model_year origin                       name  special_sum  special_sum_2
0  18.0          8         307.0       130.0    3504          12.0          70    usa  chevrolet chevelle malibu         67.0           67.0
1  15.0          8         350.0       165.0    3693          11.5          70    usa          buick skylark 320         66.0           66.0
2  18.0          8         318.0       150.0    3436          11.0          70    usa         plymouth satellite         66.0           66.0
3  16.0          8         304.0       150.0    3433          12.0          70    usa              amc rebel sst         62.0           62.0
4  17.0          8         302.0       140.0    3449          10.5          70    usa                ford torino         60.0           60.0
5  15.0          8         429.0       198.0    4341          10.0          70    usa           ford galaxie 500         57.0           57.0
6  14.0          8         454.0       220.0    4354           9.0          70    usa           chevrolet impala         57.0           57.0
7  14.0          8         440.0       215.0    4312           8.5          70    usa          plymouth fury iii         58.0           58.0
8  14.0          8         455.0       225.0    4425          10.0          70    usa           pontiac catalina         58.0           58.0
9  15.0          8         390.0       190.0    3850           8.5          70    usa         amc ambassador dpl         59.0           59.0

      mpg  cylinders  displacement  horsepower  weight  acceleration  model_year  origin                        name  special_sum  special_sum_2
388  26.0          4         156.0        92.0    2585          14.5          82     usa  chrysler lebaron medallion        116.0          116.0
389  22.0          6         232.0       112.0    2835          14.7          82     usa              ford granada l        117.0          117.0
390  32.0          4         144.0        96.0    2665          13.9          82   japan            toyota celica gt        122.0          122.0
391  36.0          4         135.0        84.0    2370          13.0          82     usa           dodge charger 2.2        134.0          134.0
392  27.0          4         151.0        90.0    2950          17.3          82     usa            chevrolet camaro        130.0          130.0
393  27.0          4         140.0        86.0    2790          15.6          82     usa             ford mustang gl        131.0          131.0
394  44.0          4          97.0        52.0    2130          24.6          82  europe                   vw pickup        135.0          135.0
395  32.0          4         135.0        84.0    2295          11.6          82     usa               dodge rampage         91.0           91.0
396  28.0          4         120.0        79.0    2625          18.6          82     usa                 ford ranger         59.0           59.0
397  31.0          4         119.0        82.0    2720          19.4          82     usa                  chevy s-10         31.0           31.0
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
1

Maybe try creating a list of rows?

rows = mpg_df.values.tolist()

Then for each row, calculate the sum of its own value plus the next 3 rows.

sum_results = []
for i in range(len(rows)):
    # Make j equal to the index of the list that 'mpg' is
    sum_result = rows[i][j] + rows[i+1][j] + rows[i+2][j] + rows[i+3][j]
    sum_results.apppend(sum_result)