I have data that looks like this:
id Date Time assigned_pat_loc prior_pat_loc Activity
0 45546325 2/7/2011 4:29:38 EIAB^EIAB^6 NaN Admission
1 45546325 2/7/2011 5:18:22 8W^W844^A EIAB^EIAB^6 Observation
2 45546325 2/7/2011 5:18:22 8W^W844^A EIAB^EIAB^6 Transfer to 8W
3 45546325 2/7/2011 6:01:44 8W^W858^A 8W^W844^A Bed Movement
4 45546325 2/7/2011 7:20:44 8W^W844^A 8W^W858^A Bed Movement
5 45546325 2/9/2011 18:36:03 8W^W844^A NaN Discharge-Observation
6 45666555 3/8/2011 20:22:36 EIC^EIC^5 NaN Admission
7 45666555 3/9/2011 1:08:04 53^5314^A EIC^EIC^5 Admission
8 45666555 3/9/2011 1:08:04 53^5314^A EIC^EIC^5 Transfer to 53
9 45666555 3/9/2011 17:03:38 53^5336^A 53^5314^A Bed Movement
I need to find where there were multiple patients (identified with id
column) are in the same room at the same time, the start and end times for those, the dates, and room number (assigned_pat_loc
). assigned_pat_loc
is the current patient location in the hospital, formatted as “unit^room^bed”.
So far I've done the following:
# Read in CSV file and remove bed number from patient location
data = pd.read_csv('raw_data.csv')
data['assigned_pat_loc'] = data['assigned_pat_loc'].str.replace(r"([^^]+\^[^^]+).*", r"\1", regex=True)
# Convert Date column to datetime type
patient_data['Date'] = pd.to_datetime(patient_data['Date'])
# Sort dataframe by date
patient_data.sort_values(by=['Date'], inplace = True)
# Identify rows with duplicate room and date assignments, indicating multiple patients shared room
same_room = patient_data.duplicated(subset = ['Date','assigned_pat_loc'])
# Assign duplicates to new dataframe
df_same_rooms = patient_data[same_room]
# Remove duplicate patient ids but keep latest one
no_dups = df_same_rooms.drop_duplicates(subset = ['id'], keep = 'last')
# Group patients in the same rooms at the same times together
df_shuf = pd.concat(group[1] for group in df_same_rooms.groupby(['Date', 'assigned_pat_loc'], sort=False))
And then I'm stuck at this point:
id Date Time assigned_pat_loc prior_pat_loc Activity
599359 42963403 2009-01-01 12:32:25 11M^11MX 4LD^W463^A Transfer
296155 42963484 2009-01-01 16:41:55 11M^11MX EIC^EIC^2 Transfer
1373 42951976 2009-01-01 15:51:09 11M^11MX NaN Discharge
362126 42963293 2009-01-01 4:56:57 11M^11MX EIAB^EIAB^6 Transfer
362125 42963293 2009-01-01 4:56:57 11M^11MX EIAB^EIAB^6 Admission
... ... ... ... ... ... ...
268266 46381369 2011-09-09 18:57:31 54^54X 11M^1138^A Transfer
16209 46390230 2011-09-09 6:19:06 10M^1028 EIAB^EIAB^5 Admission
659699 46391825 2011-09-09 14:28:20 9W^W918 EIAB^EIAB^3 Transfer
659698 46391825 2011-09-09 14:28:20 9W^W918 EIAB^EIAB^3 Admission
268179 46391644 2011-09-09 17:48:53 64^6412 EIE^EIE^3 Admission
Where you can see different patients in the same room at the same time, but I don't know how to extract those intervals of overlap between two different rows for the same room and same times. And then to format it such that the start time
and end time
are related to the earlier and later times of the transpiring of a shared room between two patients. Below is the desired output.
Where r_id
is the id
of the other patient sharing the same room and length
is the number of hours that room was shared.