How to build a nested tree structure from a list of adjacencies?

Question

Considering that I have:

a list of adjacent keys (child - parent) named A
a tree class named Tree storing its own node key (integer) and children (classes)

A = [(61, 66), (50, 61), (68, 61), (33, 61), (57, 66), (72, 66), (37, 68), (71, 33), (6, 50), (11, 37), (5, 37)]

class Tree:
    def __init__(self, node, *children):
        self.node = node
        if children: self.children = children
        else: self.children = []
    
    def __str__(self): 
        return "%s" % (self.node)
    def __repr__(self):
        return "%s" % (self.node)

    def __getitem__(self, k):
        if isinstance(k, int) or isinstance(k, slice): 
            return self.children[k]
        if isinstance(k, str):
            for child in self.children:
                if child.node == k: return child

    def __iter__(self): return self.children.__iter__()

    def __len__(self): return len(self.children)

How can I build a Tree object such that it encapsulates all the inner trees in accordance with the adjacencies ? (like the following)

t = Tree(66, 
        Tree(72), 
        Tree(57), 
        Tree(61, 
            Tree(33,
                Tree(71)), 
            Tree(50, 
                Tree(6)), 
            Tree(68, 
                Tree(37, 
                    Tree(11), Tree(5)))))

I was thinking about creating the tree in a recursive way but I can not figure out how to traverse and populate it properly. Here is my failed attempt:

from collections import defaultdict

# Create a dictionary: key = parent, values = children
d = defaultdict(list)
for child, parent in A:
    d[parent].append(child)

# Failed attempt
def build_tree(k):    
    if k in d:
        tree = Tree(k, d[k]) #1st issue: should input a Tree() as 2nd parameter
        for child in d[k]:
            build_tree(child) #2nd issue: should populate tree, not iterate recursively over children keys

#I know that the root node is 66.
full_tree = build_tree(66)

First up, how do you know the graph will be a valid tree with the given node? Do you check that? Then are you building it in the constructor, or in a build_tree function? Tree appears to only have one node, and the build_tree function can't do anything because the class doesn't provide a way to add nodes. I'm not sure you're quite ready to post a specific question. — Kenny Ostrom, Sep 12 '20 at 20:48
Ideally, the tree should be built with a function (build_tree). Trees are only storing their own node/key (integer) along with their children (list of trees). Final output should be a tree containing trees (children), containing trees (children of children)...etc. Question has been updated. — solub, Sep 12 '20 at 22:03

score 2 · Answer 1 · answered Sep 13 '20 at 00:11

You're close. The critical thing is to return the new node back to the parent and append it to the parent node's children list. If your parent list is fixed upon initialization, simply use a temporary list, then create the parent after visiting and creating the children.

Here's a minimal example:

from collections import defaultdict, namedtuple

def build_tree(tree, root):
    if root:
        return Node(root, [build_tree(tree, x) for x in tree.get(root, [])])

def print_tree(root, indent=0):
    if root:
        print(" " * indent + str(root.val))
        
        for child in root.children:
            print_tree(child, indent + 2)

if __name__ == "__main__":
    A = [(61, 66), (50, 61), (68, 61), (33, 61), (57, 66), (72, 66), 
         (37, 68), (71, 33), (6, 50), (11, 37), (5, 37)]
    Node = namedtuple("Node", "val children")
    nodes = defaultdict(list)
    
    for child, parent in A:
        nodes[parent].append(child)

    print_tree(build_tree(nodes, 66))

Output:

Your answer put me on the right track but trincot's explanations and suggestions allowed to fully address the question. Thank you for your time. — solub, Sep 13 '20 at 10:53

score 2 · Accepted Answer · answered Sep 13 '20 at 07:41

You mention two issues in this piece of code:

    tree = Tree(k, d[k]) #1st issue: should input a Tree() as 2nd parameter
    for child in d[k]:
        build_tree(child) #2nd issue: should populate tree, not iterate recursively over children keys

You can solve them by essentially moving the for loop into the second argument, in the form of list comprehension and splashing that list so they become arguments. And then make sure your recursive function returns the created tree:

    return Tree(k, 
        *[build_tree(child) for child in d[k]]
    )

More ideas

Unrelated to your question, but here are some more ideas you could use.

It would be advisable to make your code a function to which you can pass A as argument, so that also the dictionary's scope is just local to that function and does not litter the global scope.
As this feature is strongly related to the Tree class, it would be nice to define it as a static or class method within the class.
When you have the (child, parent) tuples for the tree, then these implicitly define which node is the root, so you could omit passing the literal 66 to your function. That function should be able to find out which is the root by itself. While creating the dictionary it can also collect which nodes have a parent. The root is then the node that is not in that collection.

So taking all that together you would have this:

from collections import defaultdict

class Tree:
    def __init__(self, node, *children):
        self.node = node
        self.children = children if children else []
    
    def __str__(self): 
        return "%s" % (self.node)
    
    def __repr__(self):
        return "%s" % (self.node)

    def __getitem__(self, k):
        if isinstance(k, int) or isinstance(k, slice): 
            return self.children[k]
        if isinstance(k, str):
            for child in self.children:
                if child.node == k:
                    return child

    def __iter__(self):
        return self.children.__iter__()

    def __len__(self):
        return len(self.children)

    @classmethod
    def from_pairs(Cls, pairs):
        # Turn pairs into nested dictionary
        d = defaultdict(list)
        children = set()
        for child, parent in pairs:
            d[parent].append(child)
            # collect nodes that have a parent
            children.add(child)
        
        # Find root: it does not have a parent
        root = next(parent for parent in d if parent not in children)

        # Build nested Tree instances recursively from the dictionary
        def subtree(k):
            return Cls(k, *[subtree(child) for child in d[k]])

        return subtree(root)

# Sample run
A = [(61, 66), (50, 61), (68, 61), (33, 61), (57, 66), (72, 66), (37, 68), (71, 33), (6, 50), (11, 37), (5, 37)]

tree = Tree.from_pairs(A)

Clear and informative answer. I learned a few things from your comprehensive explanations. Thank you very much. Also boiled down `root` to `set(d).difference(*d.values()).pop()` — solub, Sep 13 '20 at 10:48

Mulan · Answer 3 · 2020-09-13T17:45:09.990

Here's an opportunity to learn about reusable modules and mutual recursion. This solution in this answer solves your specific problem without any modification of the modules written in another answer¹. This is an important thing to point out because it shows how generic functions promote code reuse and lessen the chance for bugs to creep into your program.

First we will define functions that are specific to the shape of your (id, parent) input structure -

# main.py

def id(node):
  return node[0]

def parent(node):
  return node[1]

n = (12,34)

id(n)     # => 12
parent(n) # => 34

And maybe you know that the root node is 66, but that's hard for our program to infer and easy for us to define. Let's explicitly include (66, None) in your input data, where parent=None signifies a root node -

A = \
  [ (61, 66), (50, 61), (68, 61), (33, 61)
  , (57, 66), (72, 66), (37, 68), (71, 33)
  , (6, 50), (11, 37), (5, 37), (66, None) # don't forget root node, 66
  ]

Now we can use the tree module to construct our tree with ease -

# main.py

from tree import tree

def id #...
def parent #...

A = [ ... ]

B = tree \
  ( A                                # list of nodes
  , parent                           # foreign key
  , lambda node, children:           # node reconstructor
      (id(node), children(id(node))) # primary key 
  )

print(B)
# [(66, [(61, [(50, [(6, [])]), (68, [(37, [(11, []), (5, [])])]), (33, [(71, [])])]), (57, []), (72, [])])]

Notice how tree does not concern itself with the shape of your input; any node structure can be used. The tree function is flexible, and allows us to construct tree nodes in a shape completely different from the input nodes -

# main.py

from tree import tree
from json import dumps

def id #...
def parent #...

A = [ ... ]

C = tree \
  ( A
  , parent
  , lambda node, children:
      dict([("id", id(node)), ("children", children(id(node)))])
  )

print(dumps(C))

[ { "id": 66
  , "children":
      [ { "id": 61
        , "children":
            [ { "id": 50
              , "children":
                  [ { "id": 6, "children": [] }
                  ]
              }
            , { "id": 68
              , "children":
                [ { "id": 37
                  , "children":
                      [ { "id": 11, "children": [] }
                      , { "id": 5, "children": [] }
                      ]
                  }
                ]
              }
            , { "id": 33
              , "children":
                  [ { "id": 71, "children": [] }
                  ]
              }
            ]
        }
      , { "id": 57, "children": [] }
      , { "id": 72, "children": [] }
      ]
  }
]

Now we can look at the implementation of tree. Notice how tree makes no assumptions about the shape of the input nodes -

# tree.py

from index import index, get

def empty():
  return []

def tree (all, indexer, maker, root = None):
  mem = index(all, indexer)

  def many(all):
    return list(map(one, all))
  
  def one(single):
    return maker(single, lambda r: many(get(mem, r, empty())))
  
  return many(get(mem, root))

Our implementation of tree depends on another module, index. Grouping data structures, like index, along with functions that operate on those data structures is a good way to draw boundaries between modules. No assumptions about input shape made here either -

# index.py

from functools import reduce

def empty():
  return {}

def update(t, k, f):
  if k in t:
    return { **t, k: f(get(t, k)) }
  else:
    return { **t, k: f() }

def get(t, k, default = None):
  if k in t:
    return t[k]
  else:
    return default

def append(t, k, v):
  return update(t, k, lambda r = []: [ *r, v ])

def index(ls, indexer):
  return reduce \
    ( lambda t, v: append(t, indexer(v), v)
    , ls
    , empty()
    )

Verify our results by running it in your browser: run this program on repl.it

¹ Modules ported to Python. Original program written in JavaScript.

How to build a nested tree structure from a list of adjacencies?

3 Answers3

More ideas

Linked