I have a DataFrame routes
with the following structure :
id nodes traveltimes
0 id-1 [node-A, node-B] [6.0]
1 id-2 [node-A, node-C, node-D, node-E] [4.0, 80.0, 38.0]
2 id-3 [node-B, node-D] [90.0]
3 id-4 [node-A] []
4 id-5 [node-A, node-B, node-C, node-D, node-E, node-D] [35.0, 30.0, 110.0, 20.0, 5.0]
.. ... ...
The list of value in the nodes
columns are the nodes of a graph, and the value in the traveltimes
column are the time between two nodes. Each row corresponding to a route
in the graph.
I want to split my routes
on a threshold value of traveltimes
. For example, for a threshold of 70, I want to get the following result :
id route_id nodes traveltimes
0 id-1 0 [node-A, node-B] [6.0]
1 id-2 0 [node-A, node-C] [4.0]
2 id-2 1 [node-D, node-E] [38.0]
3 id-3 0 [node-B] []
4 id-3 1 [node-D] []
5 id-4 0 [node-A] []
6 id-5 0 [node-A, node-B, node-C] [35.0, 30.0]
7 id-5 1 [node-D, node-E, node-D] [20.0, 5.0]
.. ... ...
I made the following code that do what I want, but in an inefficient way.
I have a function that split the routes:
def split_routes(row):
newrow = row.copy()
threshold = 70
nodes = newrow['nodes']
traveltimes = newrow['traveltimes']
rows = []
route_id = 0
route_nodes = []
route_traveltimes = []
route_nodes.append(nodes[0])
for i in range(1, len(nodes)):
if(traveltimes[i-1]<threshold):
route_traveltimes.append(traveltimes[i-1])
route_nodes.append(nodes[i])
else :
# Route route_id completed, starting a new one
newrow['route_id'] = route_id
newrow['nodes'] = route_nodes
newrow['traveltimes'] = route_traveltimes
rows.append(newrow)
newrow = row.copy()
route_nodes = []
route_traveltimes = []
route_id+=1
route_nodes.append(nodes[i])
# Route route_id completed
newrow['route_id'] = route_id
newrow['nodes'] = route_nodes
newrow['traveltimes'] = route_traveltimes
rows.append(newrow)
df = pd.DataFrame(rows)
return df
And this is how I use it :
splitted_routes_array = []
for index, row in routes.iterrows(): # Inefficient loop
splitted_routes_array.append(split_routes(row))
splitted_routes = pd.concat(splitted_routes_array).reset_index(drop=True)
I guess I can do something way more efficient without iterating on rows by myself. But I couldn't figure out how to use apply
to return multiple rows and columns at the same time.
Can someone give me some hints on that ?