I am attempting to follow this first answer of this SO post for calculating an event duration. The example in the post is rainfall duration and the person wants to know a totalization of the rainfall and a duration in hours of the rainfall event.
My scenario is a similar time series but the application is a pump and I am wanting to know a totalized duration in hours per day that the pump runs. My data is a pump speed command, and anytime the pump speed is greater than 0.0 the pump is running.
To start with, I am reading my CSV file into Pandas.
#read CSV file
df = pd.read_csv('C:\\Users\\desktop\\data.csv', index_col='Date', parse_dates=True)
# Converting the index as date
df.index = pd.to_datetime(df.index)
df
Except I am running into an issue when attempting to convert my Date
index into date time. This returns a ValueError: day is out of range for month
Would anyone know a fix for this? Ultimetely this is the code that I am attempting to recreate from the SO post 1st answer where the author is creating help columns...
# create helper columns defining contiguous blocks and day
df['block'] = (df['Pump4VFD'].astype(bool).shift() != df['Pump4VFD'].astype(bool)).cumsum()
df['day'] = df.index.dt.normalize()
# group by day to get unique block count and value count
session_map = df[df['value'].astype(bool)].groupby('day')['block'].nunique()
hour_map = df[df['value'].astype(bool)].groupby('day')['value'].count()
# map to original dataframe
df['sessions'] = df['day'].map(session_map)
df['hours'] = df['day'].map(hour_map)
# calculate result
res = df.groupby(['day', 'hours', 'sessions'], as_index=False)['value'].sum()
res['duration'] = res['hours'] / res['sessions']
res['amount'] = res['value'] / res['sessions']
My data looks like this:
Pump4VFD
Date
1/0/00 12:45 AM 0.0
1/0/00 12:50 AM 0.0
1/0/00 12:55 AM 0.0
1/0/00 12:00 AM 0.0
1/0/00 1:05 AM 0.0