0

When I group 'time_interval_code' the values of 'vehicle_real' in a result file are correct only for the first group, but not for the others. When 'time_interval_code' was used in previous group it seems that it is not in the sum of the new group. How to make sure 'vehicle _real' values are available to sum in every group?

The idea of 'time_interval_code' was to get rid of time format. I have 8 time intervals in the morning (07:00 - 07:15 is 1, 07:15 - 07:30 - 2, etc. up to 8).

I want to check maximum flow rate in an hour by adding 15 minutes each time for every junction and every direction from which cars were entering a junction. The measurements are given in 15 minutes interval, so I need to check 4 intervals every time. Results to be 'junction_id', 'source_direction' and sum of the 'vehicle_real' for that junction, direction and group of 'time_interval_code'.

To solve this I created groups that contains 4 time intervals. The problem I have is when I group 'time_interval_code' the values of 'vehicle_real' in a result file are correct only for the first group (1,2,3,4), but not for the others.

import pandas as pd
data = pd.read_excel("traffic.xlsx")

# Create a DataFrame from the list of data
df = pd.DataFrame(data)

# Define a function to get the morning groups for each time interval code
def get_morning_group(time_interval_code):
  morning_groups = [(1, 2, 3, 4), (2, 3, 4, 5), (3, 4, 5, 6), (4, 5, 6, 7), (5, 6, 7, 8)]

  for group in morning_groups:
    if time_interval_code in group:
      return group

# Add a new column to the DataFrame that contains the morning groups for each time interval code
df['morning_groups'] = df['time_interval_code'].apply(get_morning_group)

# Group data by values
grouped_data = df.groupby(['junction_id', 'source_direction', 'morning_groups'])

# Calculate the sum of the vehicles_real values for each group
grouped_data = grouped_data['vehicles_real'].sum()

# Convert the grouped data back into a DataFrame
df = grouped_data.reset_index()

# Create the pivot table
pivot_table = df.pivot_table(index=['junction_id', 'source_direction'], columns=['morning_groups'], values='vehicles_real')

# Save the pivot table to a new Excel file
pivot_table.to_excel('max_flow_rate.xlsx')

The traffic.xlsx file has ca. 140k records. Every junction has at least 2 'source_direction' values. Junction with 'source_direction' has 'vehicles_real' values for every 'time_interval_code'. The file looks like this:

id time_interval_code junction_id source_direction vehicles_real
1 3 1001 N 140
2 1 2002 E 10
18 2 2011 W 41
21 5 2030 S 2
33 8 2030 N 140
35 7 2150 E 10
41 6 2150 W 41
52 5 2150 S 2

The output I get is fine, but the values are correct only for (1,2,3,4).

junction_id source_direction (1,2,3,4) (2,3,4,5) (3,4,5,6) (4,5,6,7) (5,6,7,8)
1001 N 257 95 69 61 59
1001 S 456 120 136 153 111
1002 N 2597 676 670 619 645
1002 S 2571 552 641 656 595
1003 N 586 181 148 127 142
1003 S 711 174 147 157 141
mixx
  • 1
  • 2
  • 1
    Welcome to stack overflow! Please have a look at [How to make good pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) and [edit] your question to include a [mcve] showing a sample of your current input, your current output, and your expected output based on that input, so that we can better understand and reproduce your issue – G. Anderson Dec 12 '22 at 17:41
  • `data = pd.read_excel("traffic.xlsx")` .... please create a sample of data that users can replicate. – D.L Dec 12 '22 at 17:57
  • Thank you, I tried to desribe the issue with more details and added both input and output samples. – mixx Dec 12 '22 at 18:58

0 Answers0