1

I have a data set containing rain data. I'm wanting to set up a running rain accumulator using Pandas.

This is what I have:

def determinePrev(df):
    #NEED TO CHANGE VALUE TO RUNNING     VVVVVV#    
    df.loc[df['Rain'] > 0, 'Running']=df['Rain'].shift(1)+df['Rain']

    return df

Running it like that works, but it's only appropriate if there are only two days of rain in a row. When replacing one of the latter 'Rain's with 'Running', I get KeyError: 'Running'.

I've been trying to find a solution but it feels like I'm not getting anywhere. I'm relatively new to Python so if you do have a solution, can you please give as much detail as possible?

Thanks!

EDIT: I should add I'm not wanting to count consecutive days, rather measure the amount of rain that has fallen over the consecutive rainy days.

EDIT #2:

Picture of current output versus desired output

DYZ
  • 55,249
  • 10
  • 64
  • 93
Phanster
  • 65
  • 6
  • 2
    Can you describe an example of input data and expected output? Its hard understanding what you are trying to achieve. – yoav_aaa Jun 23 '20 at 06:17
  • 1
    sounds like a one-liner with `.cumsum()` but please post input and expected output. Feel free to reference this. https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples – David Erickson Jun 23 '20 at 06:34
  • Alright I added a screenshot of what I had in excel. I've not used the .cumsum() function but it looks like it might have the answers I seek! I'll give it a go and see if I can get it to work. – Phanster Jun 23 '20 at 06:37
  • Why is there a 0 in the last row of the last column? Shouldn't it be 29.8? – DYZ Jun 23 '20 at 06:40
  • Does this answer your question? [Cumulative sum and percentage on column?](https://stackoverflow.com/questions/20965046/cumulative-sum-and-percentage-on-column) – DYZ Jun 23 '20 at 06:41
  • .cumsum is working but it doesn't reset on 0's. Is it possible to reset it back to 0 if there's a 0 in the rain column? The purpose of the 'Correct Output' column is to show how significant the rain events are. i.e. it could show there were two periods in July where it rained three consecutive days and allow you to compare how much water fell in each rainy period. – Phanster Jun 23 '20 at 06:48
  • Can you also show the column for the date? – Christian Eslabon Jun 23 '20 at 06:49
  • Shubham Sharma's reply works! Thanks for your help everyone, and thanks for introducing me to .cumsum! Looks like a useful tool! – Phanster Jun 23 '20 at 06:53

1 Answers1

0

Use, Series.eq and Series.cumsum to calculate the cumalative series c which is used to groups the consecutive rainy days, then use Series.groupby to group the Rain column on this series and use the transformation function cumsum:

c = df['Rain'].eq(0).cumsum()
df['Running'] = df['Rain'].groupby(c).cumsum()

Result:

# print(df)
   Rain  Running
0   0.0      0.0
1   0.8      0.8
2   2.4      3.2
3  19.4     22.6
4   6.2     28.8
5   1.0     29.8
6   0.0      0.0
Shubham Sharma
  • 68,127
  • 6
  • 24
  • 53