1

I have a dataframe with two columns as:

CHILD   PARENT
1       2
2       3
3       4
10      11
11      12

I need to create a dictionary where I keep the top parent as key and all its descendents as set of values, like:

4: [1,2,3]
12: [10,11]

I have been able to extract 12 and 4 as top parents from this dataframe, by code from following link:

extract column value based on another column pandas dataframe

Now, I have no idea how to do this in python. In java I can do this by following a dfs. Any suggestions?

brentertainer
  • 2,118
  • 1
  • 6
  • 15
SarahB
  • 318
  • 1
  • 4
  • 18
  • 1
    I can only think of using networkx as pandas isn't cut out for data structures to be specific. You can construct a graph from this relationship and then, from there, all you need is finding various subgraphs that are disconnected with each other. – LazyCoder Jul 27 '19 at 02:09
  • 1
    The graphs are guaranteed to be acyclic? And can a node be a descendant of more than one parent? – brentertainer Jul 27 '19 at 02:24

2 Answers2

3

Here is the way from networkx

import networkx as nx
G=nx.from_pandas_edgelist(df, 'CHILD', 'PARENT')
l=list(nx.connected_components(G))
L=[dict.fromkeys(y,x) for x, y in enumerate(l)]
d={k: v for d in L for k, v in d.items()}
df.groupby(df.CHILD.map(d)).agg({'CHILD':'unique','PARENT':'max'})
Out[328]: 
       PARENT      CHILD
CHILD                   
0           4  [1, 2, 3]
1          12   [10, 11]
BENY
  • 317,841
  • 20
  • 164
  • 234
1

Here is a BFS approach not based on networkx which is a great Python package, though not part of the Python standard library.

Code:

from collections import defaultdict

import pandas as pd

df = pd.DataFrame(data=[[1, 2], [2, 3], [3, 4], [10, 11], [11, 12]],
                  columns=['CHILD', 'PARENT'])

# build graph
graph = defaultdict(set)
for child, parent in df[['CHILD', 'PARENT']].values:
    graph[parent].add(child)

# identity root nodes
roots = []
for node in graph.keys():
    if all(node not in children for children in graph.values()):
        roots.append(node)

# find the descendents of each root node
result = {}
for root in roots:
    visited = set()
    unvisited = graph[root]
    while unvisited:
        visited |= unvisited
        unvisited = set.union(*(graph[node] for node in unvisited)) - visited
    result[root] = visited

print(result)

Output:

{4: {1, 2, 3}, 12: {10, 11}}
brentertainer
  • 2,118
  • 1
  • 6
  • 15