I have a dataset of parking transactions with entry and exit date timestamps and I am interested in finding the occupancy of the parking building for whenever a transaction takes place.
Sample dataset:
# Stack overflow sample
data = {'ENTRY DATE':['10/1/2022 12:00:09 AM',
'10/1/2022 12:01:49 AM',
'10/1/2022 12:02:54 AM',
'10/1/2022 12:12:30 AM',
'10/1/2022 12:15:24 AM'],
'EXIT DATE': ['10/1/2022 12:22:20 AM',
'10/1/2022 12:06:13 AM',
'10/1/2022 12:22:19 AM',
'10/1/2022 12:23:35 AM',
'10/1/2022 12:22:32 AM'],
'TYPE': ['Short term parking',
'Short term parking',
'Short term parking',
'Short term parking',
'Short term parking']
}
# Create dataframe
df = pd.DataFrame(data)
# Change data type
df['ENTRY DATE'] = pd.to_datetime(df['ENTRY DATE'])
df['EXIT DATE'] = pd.to_datetime(df['EXIT DATE'])
My current solution is creating the following query but it is extremely slow for a dataset with hundreds of thousands of rows:
def get_occ(df):
count_list = []
for exit_date in df['EXIT DATE']:
# Perform Query, append count to list
count = df.query("@exit_date > `ENTRY DATE` & @exit_date <= `EXIT DATE`" )['TYPE'].count()
count_list.append(count)
# Add counts to df
df['OCCUPANCY'] = count_list
A sampled desired output:
ENTRY DATE EXIT DATE TYPE OCCUPANCY
0 2022-10-01 00:00:09 2022-10-01 00:22:20 Short term parking 3
1 2022-10-01 00:01:49 2022-10-01 00:06:13 Short term parking 3
2 2022-10-01 00:02:54 2022-10-01 00:22:19 Short term parking 4
3 2022-10-01 00:12:30 2022-10-01 00:23:35 Short term parking 1
4 2022-10-01 00:15:24 2022-10-01 00:22:32 Short term parking 2
Any suggestions or alternatives for how I can retrieve the result I am looking for?