I'm analyzing timecard data and comparing employee's clockin/out times to each other. I'm exploring the data using a difference matrix in a DataFrame. How do I convert the day, hour timedelta to decimal, or even just a sensible +/- without the -1 days +23:40:00
weirdness?
employees = [('GILL', datetime(2022,12,1,6,40,0), datetime(2022,12,1,14,30,0)),
('BOB', datetime(2022,12,1,6,0,0), datetime(2022,12,1,14,10,0)),
('TOBY', datetime(2022,12,1,14,0,0), datetime(2022,12,1,22,30,0))]
labels = ['name', 'clockin', 'clockout']
df = pd.DataFrame.from_records(employees, columns=labels)
and my difference matrix is constructed with these two lines:
arr = (df['clockin'].values - df['clockin'].values[:, None])
pd.concat((df['name'], pd.DataFrame(arr, columns=df['name'])), axis=1)
name | GILL | BOB | TOBY | |
---|---|---|---|---|
0 | GILL | 0 days 00:00:00 | -1 days +23:20:00 | 0 days 07:20:00 |
1 | BOB | 0 days 00:40:00 | 0 days 00:00:00 | 0 days 08:00:00 |
2 | TOBY | -1 days +16:40:00 | -1 days +16:00:00 | 0 days 00:00:00 |
The trick to get a decimal difference is to use Pandas Datetime assessor's total_seconds()
function. But, this has no place in the arr
array expression.
Here is total_seconds()
doing it's magic:
df['workhours'] = round((df['clockout'] - df['clockin']).dt.total_seconds() / 60.0 / 60.0, 2)
I tried an apply on the time columns, but I can't get it to work. This might be the easy answer.
df_in.apply(lambda x: (x.total_seconds() / 60.0 / 60.0), columns=['BOB', 'GILL', 'TOBY'])