I have a DataFrame with TIME_IN and TIME_OUT columns (datetimes up to the second). I want a new DF w/ the sum of the duration (TIME_OUT - TIME_IN) by date. Each day runs from 5AM - 5AM, so I adjust for that as well.
This is part of a mini-project to teach myself Pandas, but my next application will be much more involved, so EFFICIENCY is key for me.
I've tried two approaches (resample and groupby), but both have the same issue: the timedelta DURATION column is not summing.
df["DATE"] = pd.to_datetime((df["TIME_IN"]
- dt.timedelta(hours=hrEnd)).dt.date)
df["DURATION"] = df["TIME_OUT"] - df["TIME_IN"]
dfGroupBy= df.groupby("DATE").sum()
df.setindex("DATE", inplace=True)
dfResample = df.resample("D").sum()
It seems Pandas does not sum timedelta64 type columns the way I attempted, so the returned DataFrame simply does not include the DURATION column. What is the most efficient way to do this?