Flag column for when all machines are busy?

Question

I have a DataFrame of transactions on machines, ordered chronologically on start time (a datetime.time column). Machines have locations A, B, C etc. I want to create a flag column that shows if all machines in the location are currently being used, that is, the start time of the transaction is within the start/end times of the other machines at that location.

I am attempting to deduce when there could a queue for the machines by determining when all machines are busy.

My DataFrame is heuristically laid out below, currently missing the "all busy" column:

terminal ID    MACHINE_LOCATION    MACHINE_COUNT    TRANS_TIME      NEXT_TIME all busy

001                  A                   3           09:00:00       09:08:00     N
002                  A                   3           09:01:00       09:04:00     N
003                  A                   3           09:02:00       09:07:00     Y
002                  A                   3           09:06:00       09:07:00     Y      
004                  B                   3           09:07:00       09:09:00     N

I wish to create the "all busy" column but cannot work out the logic.

EDIT: my attempt:

for i, r in df.iterrows():
    subgr = df[(df['MACHINE_LOCATION'] == r['MACHINE_LOCATION'])&(df['TRANS_DTE'] == 
              r['TRANS_DTE'])&(df['TRANS_HR'] == r['TRANS_HR'])]

    if len(subgr[(subgr['TRANS_TIME'] < r['TRANS_TIME'])&(subgr['NEXT_TIME'] > 
           r['TRANS_TIME'])]) == r['MACHINE_COUNT']:
        df.loc[i,'ALL_BUSY'] = 'Y'
    else:
        df.loc[i,'ALL_BUSY'] = 'N'

but this takes 34 days to finish running (!!!)

Can anyone vectorize this operation?

What you do want? clarify it properly, please. The above things are not making sense. — Amazing Things Around You, Jun 06 '19 at 09:42
Welcome to StackOverflow. Please take the time to read this post on [how to provide a great pandas example](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) as well as how to provide a [minimal, Complete, and Verifiable example](https://stackoverflow.com/help/mcve) and revise your question accordingly. These tips on how to ask a good question may also be useful. — yatu, Jun 06 '19 at 09:43

Carsten · Answer 1 · 2019-06-06T09:52:44.067

1

You can use the following code:

df['all_are_busy'] = df.groupby(['location'])['all busy'].transform(lambda x: x.eq('Y').all())

Careful: This will also return True if you have only one machine in a location

edited Jun 06 '19 at 09:52

answered Jun 06 '19 at 09:47

Carsten

2,765
1
13
28

My apologies, I did not clarify that I do not have the "all busy" column yet. – DeanP Jun 06 '19 at 09:57
Well, how about constructing a max_start_time and min_start_end via `groupby().transform()` and then check whether the particular start_time lies within that range? – Carsten Jun 06 '19 at 10:56
Probably my fault for not making this clearer, but this would not suffice as a solution. I need a rolling count of the number of machines currently in use at each location. – DeanP Jun 06 '19 at 12:36

Flag column for when all machines are busy?

1 Answers1