When I group 'time_interval_code'
the values of 'vehicle_real'
in a result file are correct only for the first group, but not for the others. When 'time_interval_code'
was used in previous group it seems that it is not in the sum of the new group. How to make sure 'vehicle _real'
values are available to sum in every group?
The idea of 'time_interval_code'
was to get rid of time format. I have 8 time intervals in the morning (07:00 - 07:15 is 1
, 07:15 - 07:30 - 2
, etc. up to 8
).
I want to check maximum flow rate in an hour by adding 15 minutes each time for every junction and every direction from which cars were entering a junction. The measurements are given in 15 minutes interval, so I need to check 4 intervals every time. Results to be 'junction_id'
, 'source_direction'
and sum of the 'vehicle_real'
for that junction, direction and group of 'time_interval_code'
.
To solve this I created groups that contains 4 time intervals. The problem I have is when I group 'time_interval_code'
the values of 'vehicle_real'
in a result file are correct only for the first group (1,2,3,4)
, but not for the others.
import pandas as pd
data = pd.read_excel("traffic.xlsx")
# Create a DataFrame from the list of data
df = pd.DataFrame(data)
# Define a function to get the morning groups for each time interval code
def get_morning_group(time_interval_code):
morning_groups = [(1, 2, 3, 4), (2, 3, 4, 5), (3, 4, 5, 6), (4, 5, 6, 7), (5, 6, 7, 8)]
for group in morning_groups:
if time_interval_code in group:
return group
# Add a new column to the DataFrame that contains the morning groups for each time interval code
df['morning_groups'] = df['time_interval_code'].apply(get_morning_group)
# Group data by values
grouped_data = df.groupby(['junction_id', 'source_direction', 'morning_groups'])
# Calculate the sum of the vehicles_real values for each group
grouped_data = grouped_data['vehicles_real'].sum()
# Convert the grouped data back into a DataFrame
df = grouped_data.reset_index()
# Create the pivot table
pivot_table = df.pivot_table(index=['junction_id', 'source_direction'], columns=['morning_groups'], values='vehicles_real')
# Save the pivot table to a new Excel file
pivot_table.to_excel('max_flow_rate.xlsx')
The traffic.xlsx
file has ca. 140k records. Every junction has at least 2 'source_direction'
values. Junction with 'source_direction'
has 'vehicles_real'
values for every 'time_interval_code'
. The file looks like this:
id | time_interval_code | junction_id | source_direction | vehicles_real |
---|---|---|---|---|
1 | 3 | 1001 | N | 140 |
2 | 1 | 2002 | E | 10 |
18 | 2 | 2011 | W | 41 |
21 | 5 | 2030 | S | 2 |
33 | 8 | 2030 | N | 140 |
35 | 7 | 2150 | E | 10 |
41 | 6 | 2150 | W | 41 |
52 | 5 | 2150 | S | 2 |
The output I get is fine, but the values are correct only for (1,2,3,4)
.
junction_id | source_direction | (1,2,3,4) | (2,3,4,5) | (3,4,5,6) | (4,5,6,7) | (5,6,7,8) |
---|---|---|---|---|---|---|
1001 | N | 257 | 95 | 69 | 61 | 59 |
1001 | S | 456 | 120 | 136 | 153 | 111 |
1002 | N | 2597 | 676 | 670 | 619 | 645 |
1002 | S | 2571 | 552 | 641 | 656 | 595 |
1003 | N | 586 | 181 | 148 | 127 | 142 |
1003 | S | 711 | 174 | 147 | 157 | 141 |