1

I have a DataFrame of transactions on machines, ordered chronologically on start time (a datetime.time column). Machines have locations A, B, C etc. I want to create a flag column that shows if all machines in the location are currently being used, that is, the start time of the transaction is within the start/end times of the other machines at that location.

I am attempting to deduce when there could a queue for the machines by determining when all machines are busy.

My DataFrame is heuristically laid out below, currently missing the "all busy" column:

terminal ID    MACHINE_LOCATION    MACHINE_COUNT    TRANS_TIME      NEXT_TIME all busy

001                  A                   3           09:00:00       09:08:00     N
002                  A                   3           09:01:00       09:04:00     N
003                  A                   3           09:02:00       09:07:00     Y
002                  A                   3           09:06:00       09:07:00     Y      
004                  B                   3           09:07:00       09:09:00     N

I wish to create the "all busy" column but cannot work out the logic.

EDIT: my attempt:

for i, r in df.iterrows():
    subgr = df[(df['MACHINE_LOCATION'] == r['MACHINE_LOCATION'])&(df['TRANS_DTE'] == 
              r['TRANS_DTE'])&(df['TRANS_HR'] == r['TRANS_HR'])]

    if len(subgr[(subgr['TRANS_TIME'] < r['TRANS_TIME'])&(subgr['NEXT_TIME'] > 
           r['TRANS_TIME'])]) == r['MACHINE_COUNT']:
        df.loc[i,'ALL_BUSY'] = 'Y'
    else:
        df.loc[i,'ALL_BUSY'] = 'N'

but this takes 34 days to finish running (!!!)

Can anyone vectorize this operation?

Mark
  • 934
  • 1
  • 10
  • 25
DeanP
  • 35
  • 5
  • 1
    What you do want? clarify it properly, please. The above things are not making sense. – Amazing Things Around You Jun 06 '19 at 09:42
  • Welcome to StackOverflow. Please take the time to read this post on [how to provide a great pandas example](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) as well as how to provide a [minimal, Complete, and Verifiable example](https://stackoverflow.com/help/mcve) and revise your question accordingly. These tips on how to ask a good question may also be useful. – yatu Jun 06 '19 at 09:43

1 Answers1

1

You can use the following code:

df['all_are_busy'] = df.groupby(['location'])['all busy'].transform(lambda x: x.eq('Y').all())

Careful: This will also return True if you have only one machine in a location

Carsten
  • 2,765
  • 1
  • 13
  • 28
  • My apologies, I did not clarify that I do not have the "all busy" column yet. – DeanP Jun 06 '19 at 09:57
  • Well, how about constructing a max_start_time and min_start_end via `groupby().transform()` and then check whether the particular start_time lies within that range? – Carsten Jun 06 '19 at 10:56
  • Probably my fault for not making this clearer, but this would not suffice as a solution. I need a rolling count of the number of machines currently in use at each location. – DeanP Jun 06 '19 at 12:36