0

Here is my raw data

Raw Data

Here is the data (including types) after I add on the column 'Date_2wks_Ago' within Pandas

enter image description here

I would like to add on a new column 'Rainfall_Last7Days' that calculates, for each day, the total amount of rainfall for the last week.

So (ignoring the other columns that aren't relevant) it would look a little like this...

Ideal Dataset

Anyone know how to do this in Pandas?

My data is about 1000 observations long, so not huge.

  • 1
    Please re-fromat the question so the data can be copied and used to help create an answer. Plase see [this helpful question that'll show you what to do](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – mullinscr Feb 04 '21 at 22:58

1 Answers1

0

I think what you are looking for is the rolling() function.

This section recreates a simplified version of table

 import pandas as pd
    import numpy as np
    
    # Create df

rainfall_from_9am=[4.6
                    ,0.4
                    ,3.6
                    ,3.5
                    ,3.2
                    ,5.5
                    ,2.2
                    ,1.3
                    ,0
                    ,0
                    ,0.04
                    ,0
                    ,0
                    ,0
                    ,0.04
                    ,0.4]

date=['2019-02-03'
        ,'2019-02-04'
        ,'2019-02-05'
        ,'2019-02-06'
        ,'2019-02-07'
        ,'2019-02-08'
        ,'2019-02-09'
        ,'2019-02-10'
        ,'2019-02-11'
        ,'2019-02-12'
        ,'2019-02-13'
        ,'2019-02-14'
        ,'2019-02-15'
        ,'2019-02-16'
        ,'2019-02-17'
        ,'2019-02-18'
        ]

# Create df from list
df=pd.DataFrame({'rainfall_from_9am':rainfall_from_9am
                ,'date':date
                })

This part calculates the rolling sum of rainfall for the current and previous 6 records.

df['rain_last7days']=df['rainfall_from_9am'].rolling(7).sum()

print(df)
          

Output:

          date  rainfall_from_9am  rain_last7days
0   2019-02-03               4.60             NaN
1   2019-02-04               0.40             NaN
2   2019-02-05               3.60             NaN
3   2019-02-06               3.50             NaN
4   2019-02-07               3.20             NaN
5   2019-02-08               5.50             NaN
6   2019-02-09               2.20           23.00
7   2019-02-10               1.30           19.70
8   2019-02-11               0.00           19.30
9   2019-02-12               0.00           15.70
10  2019-02-13               0.04           12.24
11  2019-02-14               0.00            9.04
12  2019-02-15               0.00            3.54
13  2019-02-16               0.00            1.34
14  2019-02-17               0.04            0.08
15  2019-02-18               0.40            0.48

Conscious that this output does not match exactly with the example in your original question. Can you please help verify the correct logic you are after?

Bojan
  • 31
  • 1
  • 3
  • Hi Bojan, this actually worked great, don't worry about the original logic this is a great function that I had not heard of but does that job well. I do need to tweak it as it is adding the current day twice? If you look at observation 10 in the output above and count back 7 days, you'll see the total is only 12.24 but the output is 12.28 (it is adding the 0.04, observation 10 twice). – Jamie Oram Feb 05 '21 at 09:35
  • Hi Jamie, good pick up. This is an oversight and I have edited my initial response. – Bojan Feb 05 '21 at 18:35