0

I have a list of parent-child relations where the structure isn't a true tree. Some parents can have many children and also some children can have more than one parent.

import pandas as pd    
df = pd.DataFrame([[123,234],[123,235],[123,236],[124,236],[234,345],[236,346]], columns=['Parent','Child'])*

I would like to group all children for specific ancestors. From the data:

123,234,235,236,345,346
124,235,346

Should be the correct groups.

I tried with:

parents = set()
children = {}
for p, c in df.to_records(index=False).tolist():
    parents.add(p)
    children[c] = p


def getAncestors(p):
    return (getAncestors(children[p]) if p in children else []) + [p]

But on 346 it only returns one group.

Also, how to then find all children for 123 and 124?

Thank you!

  • 1
    Did you considered using a lib like networkx ? – Plopp Jul 25 '18 at 12:52
  • 1
    Determine nodes that are not the child of any other node, then do DFS from each of those? – tobias_k Jul 25 '18 at 12:55
  • Actually, solved the problem using networkx. Created MultiDiGraph, converted to Graph to create clusters and than applied nx.ancestors to get the parents. Thanks for pointing me in this direction! – Aleš Juvančič Aug 09 '18 at 12:45

1 Answers1

3

As you said, it's not really a tree, but more like a directed acyclic graph, so you can't map each child to just one parent; it'd have to be a list of parents. Also, given your use case, I'd suggest mapping parents to their lists of children instead.

relations = [[123,234],[234,345],[123,235],[123,236],[124,236],[236,346]]

children = {}
for p, c in relations:
    children.setdefault(p, []).append(c)
roots = set(children) - set(c for cc in children.values() for c in cc)

You can then use a recursive function similar to the one you already have to get all the children to a given root node (or any parent node). The root itself is not in the list, but can easily be added.

def all_children(p):
    if p not in children:
        return set()
    return set(children[p] + [b for a in children[p] for b in all_children(a)])

print({p: all_children(p) for p in roots})
# {123: {234, 235, 236, 345, 346}, 124: {346, 236}}
tobias_k
  • 81,265
  • 12
  • 120
  • 179