I have a code that creates csv files after certain operations with original dataframe:
import pandas as pd
timetable = pd.read_excel('timetable.xlsx')
data = {"stop_id": timetable['stop_id'], "arrival_time": timetable['arrival_time'], 'route_id': timetable['route_id']}
df = pd.DataFrame(data=data) # Create the DataFrame from the data
g = df.groupby(['stop_id', 'arrival_time']).size()
stops = { i[0] for i in g.index }
for stop in stops:
times = filter(lambda x: x[0] == stop, g.index)
data = { "stop_id": [], "arrival_time": [], "number": []}
for time in times:
data["stop_id"].append(stop) # add the stop_id
data["arrival_time"].append(time[1]) # add the current time
data["number"].append(g[(stop, time[1])]) # add its count
pd.DataFrame(data=data).to_csv(f"{stop}.csv", index=False)
how should I change the code so that it also appends other columns' values?
I have a column route_id
which has different values for each unique stop_id
, and I want to list these route_id
values per each arrival_time
row. context: a bus (route_id
) arrives at stop_id
at a certain arrival_time
, but there can be several buses arriving at the same arrival_time
, so I want to know which route_id
arrived at a certain time
the data: https://docs.google.com/spreadsheets/d/1O6QGWZh0Yp2EcJAnlvIJw0xiCH8T1AY_/edit#gid=640877265
the extract from the data:
route_id stop_id arrival_time
429 2179 4/6/22 19:40:00
429 2179 4/6/22 08:06:00
429 2179 4/6/22 09:20:00
429 2179 4/6/22 11:12:00
429 2179 4/6/22 12:25:00
429 2179 4/6/22 13:39:00
429 2179 4/6/22 17:56:00
429 2179 4/6/22 19:19:00
441 2179 4/6/22 07:16:00
441 2179 4/6/22 10:37:00
441 2179 4/6/22 14:33:00