I have a DataFrame of transactions on machines, ordered chronologically on start time (a datetime.time column). Machines have locations A, B, C etc. I want to create a flag column that shows if all machines in the location are currently being used, that is, the start time of the transaction is within the start/end times of the other machines at that location.
I am attempting to deduce when there could a queue for the machines by determining when all machines are busy.
My DataFrame is heuristically laid out below, currently missing the "all busy" column:
terminal ID MACHINE_LOCATION MACHINE_COUNT TRANS_TIME NEXT_TIME all busy
001 A 3 09:00:00 09:08:00 N
002 A 3 09:01:00 09:04:00 N
003 A 3 09:02:00 09:07:00 Y
002 A 3 09:06:00 09:07:00 Y
004 B 3 09:07:00 09:09:00 N
I wish to create the "all busy" column but cannot work out the logic.
EDIT: my attempt:
for i, r in df.iterrows():
subgr = df[(df['MACHINE_LOCATION'] == r['MACHINE_LOCATION'])&(df['TRANS_DTE'] ==
r['TRANS_DTE'])&(df['TRANS_HR'] == r['TRANS_HR'])]
if len(subgr[(subgr['TRANS_TIME'] < r['TRANS_TIME'])&(subgr['NEXT_TIME'] >
r['TRANS_TIME'])]) == r['MACHINE_COUNT']:
df.loc[i,'ALL_BUSY'] = 'Y'
else:
df.loc[i,'ALL_BUSY'] = 'N'
but this takes 34 days to finish running (!!!)
Can anyone vectorize this operation?