1

I am trying to compute for the “maximum rainfall” in different durations, say, “1hr” and “3hr” duration.

Using my dataset:

Date                                  Rainfall
2007-01-01 00:00:00                       0 
2007-01-01 01:00:00                      0.5    
2007-01-01 02:00:00                      0.9
2007-01-01 03:00:00                      0.2
…..
2009-01-01 00:00:00                       0 
2009-01-01 01:00:00                      0.4    
2009-01-01 02:00:00                      0.8
2009-01-01 03:00:00                      0.9

If I will set my date range (e.g. 2007-01-01 to 2009-01-01), and I specify it to be maximum “3hr” duration, the computation should look like this (this should be in a for loop, I guess):

Starting at 2007-01-01 00:00:00 , the maximum 3hr is :

0+0.5+0.9=1.4

Then starting at 2007-01-01 01:00:00 , the maximum 3hr is :

0.5+0.9+0.2=1.6

And so on… until it finds the real maximum 3hr duration when it reach 2009-01-01 03:00:00.

I know this needs to have a for loop, I am new to python so I’m still learning. Anyone who can help me shed light about this?

karate
  • 17
  • 8

1 Answers1

0

I think you need DataFrame.rolling with DatetimeIndex and frequence 3H with sum:

df['Date'] = pd.to_datetime(df['Date'])

df1 = df.set_index('Date').rolling('3H').sum()
print (df1)
                     Rainfall
Date                         
2007-01-01 00:00:00       0.0
2007-01-01 01:00:00       0.5
2007-01-01 02:00:00       1.4
2007-01-01 03:00:00       1.6

If need maximum per 3 hours:

df2 = df.set_index('Date').rolling('3H').max()
print (df2)
                     Rainfall
Date                         
2007-01-01 00:00:00       0.0
2007-01-01 01:00:00       0.5
2007-01-01 02:00:00       0.9
2007-01-01 03:00:00       0.9
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • but unfortunately, due to the large data sets I have, rolling sum seems unable to handle it. the data frame itself doesnt load.. – karate Apr 29 '20 at 02:01
  • @karate - What is number of rows in file? What is your code for load DataFrame? Because I have no experience problem with load DataFrame. – jezrael Apr 29 '20 at 04:25
  • hi, I was able to do the rolling sum once i transform it to csv. however, if i want the rolling sum to be per Year, what should I add in the code? – karate Apr 29 '20 at 04:56
  • @karate - Do you think change `.rolling('3H')` to `.rolling('Y')` ? – jezrael Apr 29 '20 at 05:05
  • no. i'm planning to still use rolling sum every 3 hours. But the highest 3 hour rolling sum per year will be my output.. – karate Apr 29 '20 at 05:08
  • @karate - So need `df1 = df.set_index('Date').rolling('3H').sum() df1 = df1.loc[df1.groupby(df1.index.year)['Rainfall'].idxmax()]` ? – jezrael Apr 29 '20 at 05:13
  • can you explain what idxmax () is for? – karate Apr 29 '20 at 05:36
  • @karate - It return one maximal row per year, check [this](https://stackoverflow.com/q/15705630) – jezrael Apr 29 '20 at 05:37