calculating event duration in pandas

Question

I am attempting to reference this SO post where the first answer someone creates some code to analyze the time series data of rain fall and totalize in hours for each rain event. I want to do something similar except for totalizing how often a pump runs (event) in hours each day from time series data. There is no need on my end for totalizing anything else just the duration of the event.

#read CSV file
df = pd.read_csv('C:\\Users\\desktop\\data.csv', index_col='Date', parse_dates=True)

# Converting the index as date
df.index = pd.to_datetime(df.index)

df

My data looks like this:

                    ChWaterPrs
Date                                                                    
1/0/00 12:45 AM          0.0
1/0/00 12:50 AM          0.0
1/0/00 12:55 AM          0.0
1/0/00 12:00 AM          0.0
1/0/00 1:05 AM           0.0

I know that the pump is running everytime the df['ChWaterPrs'] is greater than 5, so I am attempting to build these helper colums as mentioned in the SO post:

# create helper columns, a block would be anytime value is > 5
df['block'] = df['ChWaterPrs'] >= 5
df['day'] = df.index.normalize()

# group by day to get unique block count and value count
session_map = df[df['ChWaterPrs'].astype(bool)].groupby('day')['block'].nunique()
hour_map = df[df['ChWaterPrs'].astype(bool)].groupby('day')['ChWaterPrs'].count()

# map to original dataframe
df['sessions'] = df['day'].map(session_map)
df['hours'] = df['day'].map(hour_map)

# calculate result
res = df.groupby(['day', 'hours', 'sessions'], as_index=False)['ChWaterPrs'].sum()
res['duration'] = res['hours'] / res['sessions']
res['amount'] = res['ChWaterPrs'] / res['sessions']

But if I print res something is screwed up... The durations don't make sense that they are greater than 24 hours in a day. Basically the only information I would love to have is a resample of the time series data for how many hours that the pump ran which is determined by the df['ChWaterPrs'] being greater than 5. I think the data would show everyday (a session) where the df['ChWaterPrs'] would be zero else greater than 5 when running. Any tips help thank you.

Yes, something happened to my CSV file which made it corrupt and I had to retrieve a new set of data which looks more like the screen shot. — bbartling, Feb 28 '19 at 15:19
It is in 5 minute intervals tho, a years worth of time series data — bbartling, Feb 28 '19 at 15:20

calculating event duration in pandas

0 Answers0