0

I have several lines defined in a dataframe.

import pandas as pd


df = pd.DataFrame(
    {
        'from': ['p2', 'p3', 'p1'],
        'to': ['p3', 'p4', 'p2'],
    },
    index=['line_b', 'line_c', 'line_a'],
)

# How to get line_sequence as ['line_a', 'line_b', 'line_c']?

Each line has a from point and a to point. These lines are connected in certain sequence. In this example, the sequence is line_a --> line_b --> line_c.

Could you please show me how to quickly find the connection sequence based on the columns of from and to? In the example above, there are numbers in the points' names, like 'p1' and 'p2'. It is just an example. In my real application, the names could be any string.

The expected outcome should be in the format of List[str].

Thanks.

aura
  • 383
  • 7
  • 24
  • You need topological sorting: https://stackoverflow.com/questions/47192626/deceptively-simple-implementation-of-topological-sorting-in-python – Psidom Oct 08 '21 at 18:35
  • This looks like a network problem. Try [networkx](https://networkx.org/). – Quang Hoang Oct 08 '21 at 18:51

1 Answers1

1

Using networkx:

import pandas as pd
import networkx as nx

df = pd.DataFrame({'from': ['p2', 'p3', 'p1'], 'to': ['p3', 'p4', 'p2'], }, index=['line_b', 'line_c', 'line_a'])

# create graph
df = df.reset_index()
G = nx.from_pandas_edgelist(df, "from", "to", edge_attr="index", create_using=nx.DiGraph)

# store edge attributes
index = nx.get_edge_attributes(G, "index")


# find topological sort
order = nx.topological_sort(nx.line_graph(G))

# map to index
res = [index[edge] for edge in order]
print(res)

Output

['line_a', 'line_b', 'line_c']
Dani Mesejo
  • 61,499
  • 6
  • 49
  • 76