0

I am trying to run a simple count function which runs a dataframe of event times (specifically surgeries) against another dataframe of shift time frames, and returns a list of how many events occur during each shift. These csvs are thousands of rows, though, so while the way I have it set up currently works, it takes forever. This is what I have:

numSurgeries = [0 for shift in range(len(df.Date))]

for i in range(len(OR['PATIENT_IN_ROOM_DTTM'])):
    for shift in range(len(df.DateTime)):
        if OR['PATIENT_IN_ROOM_DTTM'][i] >= df.DateTime[shift] and OR['PATIENT_IN_ROOM_DTTM'][i] < df.DateTime[shift+1]:
            numSurgeries[shift] += 1

So it loops through each event and checks to see which shift time frame it is in, then increments the count for that time frame. Logical, works, but definitely not efficient.

EDIT:

Example of OR data file

Example of df data file

Charles Bushrow
  • 437
  • 4
  • 12
  • Welcome to StackOverflow! Please read about [how to ask a question](https://stackoverflow.com/help/how-to-ask) (particularly [how to create a good example](https://stackoverflow.com/help/mcve)) in order to get good responses. – Alex Jan 25 '18 at 17:01
  • In your case please provide an example DataFrame (`OR.head()`) and the expected output. – Alex Jan 25 '18 at 17:01

2 Answers2

0

Without example data, it's not absolutely clear what you want. But this should help you vectorise:

numSurgeries = {shift: np.sum((OR['PATIENT_IN_ROOM_DTTM'] >= df.DateTime[shift]) & \
                       (OR['PATIENT_IN_ROOM_DTTM'] < df.DateTime[shift+1])) \
                       for shift in range(len(df.Date))}

The output is a dictionary mapping integer shift to numSurgeries.

jpp
  • 159,742
  • 34
  • 281
  • 339
0

As mentioned above, it is hard to answer without example data.

However, a boolean mask sounds fitting. See Select dataframe rows between two dates.

Create a date mask from shift, we'll call the start and end dates start_shift and end_shift respectively. These should be in datetime format.

date_mask = (df['datetime'] >= start_shift) & (df['datetime'] <= end_shift)

Locate all values in df that fit this range.

df_shift = df.loc[date_mask]

Count the number of instances in the new df_shift.

num_surgeries = len(df_shift.index())

Cycle through all shifts.

def count_shifts(df, shift, results_df, start_shift, end_shift):

    date_mask = (df['datetime'] >= start_shift) & (df['datetime'] <= end_shift)
    df_shift = df.loc[date_mask]
    num_surgeries = len(df_shift.index())

    return(num_surgeries)

# iterates through df and applies the above function to every row
results_df['num_surgeries'] = results_df.apply(calculate_num_surgeries,axis=1)

Also remember to name variables according to PEP8 Style Guide! Camelcase is not recommended in Python.