0

I am attempting to follow this first answer of this SO post for calculating an event duration. The example in the post is rainfall duration and the person wants to know a totalization of the rainfall and a duration in hours of the rainfall event.

My scenario is a similar time series but the application is a pump and I am wanting to know a totalized duration in hours per day that the pump runs. My data is a pump speed command, and anytime the pump speed is greater than 0.0 the pump is running.

To start with, I am reading my CSV file into Pandas.

#read CSV file
df = pd.read_csv('C:\\Users\\desktop\\data.csv', index_col='Date', parse_dates=True)

# Converting the index as date
df.index = pd.to_datetime(df.index)

df

Except I am running into an issue when attempting to convert my Date index into date time. This returns a ValueError: day is out of range for month

Would anyone know a fix for this? Ultimetely this is the code that I am attempting to recreate from the SO post 1st answer where the author is creating help columns...

# create helper columns defining contiguous blocks and day
df['block'] = (df['Pump4VFD'].astype(bool).shift() != df['Pump4VFD'].astype(bool)).cumsum()
df['day'] = df.index.dt.normalize()

# group by day to get unique block count and value count
session_map = df[df['value'].astype(bool)].groupby('day')['block'].nunique()
hour_map = df[df['value'].astype(bool)].groupby('day')['value'].count()

# map to original dataframe
df['sessions'] = df['day'].map(session_map)
df['hours'] = df['day'].map(hour_map)

# calculate result
res = df.groupby(['day', 'hours', 'sessions'], as_index=False)['value'].sum()
res['duration'] = res['hours'] / res['sessions']
res['amount'] = res['value'] / res['sessions']

My data looks like this:

                    Pump4VFD
Date                                                                    
1/0/00 12:45 AM          0.0
1/0/00 12:50 AM          0.0
1/0/00 12:55 AM          0.0
1/0/00 12:00 AM          0.0
1/0/00 1:05 AM           0.0
bbartling
  • 3,288
  • 9
  • 43
  • 88

1 Answers1

1

You might want to add dayfirst=True to pd.to_datetime

df.index = pd.to_datetime(df.index,dayfirst=True)

This might be one way to go about;but if does not work then try the following addition:

df.index = pd.to_datetime(df.index,dayfirst=True,infer_datetime_format=True)

If you need further clarification try this post:ValueError: day is out of range for month