Values summed in a previous group won't sum in a new group

Question

When I group 'time_interval_code' the values of 'vehicle_real' in a result file are correct only for the first group, but not for the others. When 'time_interval_code' was used in previous group it seems that it is not in the sum of the new group. How to make sure 'vehicle _real' values are available to sum in every group?

The idea of 'time_interval_code' was to get rid of time format. I have 8 time intervals in the morning (07:00 - 07:15 is 1, 07:15 - 07:30 - 2, etc. up to 8).

I want to check maximum flow rate in an hour by adding 15 minutes each time for every junction and every direction from which cars were entering a junction. The measurements are given in 15 minutes interval, so I need to check 4 intervals every time. Results to be 'junction_id', 'source_direction' and sum of the 'vehicle_real' for that junction, direction and group of 'time_interval_code'.

To solve this I created groups that contains 4 time intervals. The problem I have is when I group 'time_interval_code' the values of 'vehicle_real' in a result file are correct only for the first group (1,2,3,4), but not for the others.

import pandas as pd
data = pd.read_excel("traffic.xlsx")

# Create a DataFrame from the list of data
df = pd.DataFrame(data)

# Define a function to get the morning groups for each time interval code
def get_morning_group(time_interval_code):
  morning_groups = [(1, 2, 3, 4), (2, 3, 4, 5), (3, 4, 5, 6), (4, 5, 6, 7), (5, 6, 7, 8)]

  for group in morning_groups:
    if time_interval_code in group:
      return group

# Add a new column to the DataFrame that contains the morning groups for each time interval code
df['morning_groups'] = df['time_interval_code'].apply(get_morning_group)

# Group data by values
grouped_data = df.groupby(['junction_id', 'source_direction', 'morning_groups'])

# Calculate the sum of the vehicles_real values for each group
grouped_data = grouped_data['vehicles_real'].sum()

# Convert the grouped data back into a DataFrame
df = grouped_data.reset_index()

# Create the pivot table
pivot_table = df.pivot_table(index=['junction_id', 'source_direction'], columns=['morning_groups'], values='vehicles_real')

# Save the pivot table to a new Excel file
pivot_table.to_excel('max_flow_rate.xlsx')

The traffic.xlsx file has ca. 140k records. Every junction has at least 2 'source_direction' values. Junction with 'source_direction' has 'vehicles_real' values for every 'time_interval_code'. The file looks like this:

id	time_interval_code	junction_id	source_direction	vehicles_real
1	3	1001	N	140
2	1	2002	E	10
18	2	2011	W	41
21	5	2030	S	2
33	8	2030	N	140
35	7	2150	E	10
41	6	2150	W	41
52	5	2150	S	2

The output I get is fine, but the values are correct only for (1,2,3,4).

junction_id	source_direction	(1,2,3,4)	(2,3,4,5)	(3,4,5,6)	(4,5,6,7)	(5,6,7,8)
1001	N	257	95	69	61	59
1001	S	456	120	136	153	111
1002	N	2597	676	670	619	645
1002	S	2571	552	641	656	595
1003	N	586	181	148	127	142
1003	S	711	174	147	157	141

Welcome to stack overflow! Please have a look at [How to make good pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) and [edit] your question to include a [mcve] showing a sample of your current input, your current output, and your expected output based on that input, so that we can better understand and reproduce your issue — G. Anderson, Dec 12 '22 at 17:41
`data = pd.read_excel("traffic.xlsx")` .... please create a sample of data that users can replicate. — D.L, Dec 12 '22 at 17:57
Thank you, I tried to desribe the issue with more details and added both input and output samples. — mixx, Dec 12 '22 at 18:58

Values summed in a previous group won't sum in a new group

0 Answers0