Compare two trees/graphs in Python

Question

First of all, I should confess that I'm very bad at everything graph-related. My trees are implemented as nested dictionaries that represent unweighted Markov chains.

class Tree(object):
    def __init__(self, depth=6, seq=None):
        self.graph = dict()
        self.depth = depth
        if seq is not None:
            self.build(seq)

    def build(self, seq):
        for i in xrange(len(seq)):
            sseq = seq[i:i+self.depth]
            for j in xrange(len(sseq)):
                if j == 0:
                    if sseq[j] not in self.graph:
                        self.graph[sseq[j]] = dict()
                    nest_pointer = self.graph[sseq[j]]
                else:
                    if sseq[j] not in nest_pointer:
                        nest_pointer[sseq[j]] = dict()
                    nest_pointer = nest_pointer[sseq[j]]

What I need is to be able to compare two trees while being aware of the depth at which dissimilarities occur, because I'm gonna use a hierarchical similarity scoring system, so a simple recursive DFS doesn't do the trick.

P.S.

If you can propose a better data structure for my trees, I would appreciate it a lot. I went with dictionaries to get maximum time performance. Thank you in advance.

In regards to you P.S. , there are lots of suggestions on SO that may help you implement a different (I hesitate to say better as I haven't benchmarked it) data structure for your trees. See [this](http://stackoverflow.com/questions/3009935/looking-for-a-good-python-tree-data-structure) , [this](http://stackoverflow.com/questions/2482602/a-general-tree-implementation-in-python) , or [this](http://stackoverflow.com/questions/2358045/how-can-i-implement-a-tree-in-python-are-there-any-built-in-data-structures-in?rq=1) — Muttonchop, Feb 12 '15 at 19:45

score 4 · Accepted Answer · answered Feb 12 '15 at 20:39

Why can't you use a recursive DFS? Just pass the current height in as a parameter. I'm not quite sure how you go about comparing nodes or subtrees, but something like this might work, which just records all the times that two nodes compare unequal (with some user defined comparison nodes_different)

(pseudocode):

def compare_trees_r(node1, node2, depth, result):
    if nodes_different(node1, node2):
        result.append(depth)
    for (pairs of children c1 and c2):
        compare_trees_r(c1, c2, depth + 1, result)

def compare_trees(t1, t2):
    result = []
    compare_trees_r(t1.graph, t2.graph, 0, result)
    return result

As far as your actual data structure goes, it's hard to suggest a more apt structure without knowing what seq is. However, I strongly suggest that you create a class for your nodes, which should make reasoning about your code much easier. If it turns out that that is actually causing a performance problem (after profiling), only then optimize it (premature optimization is the root of all evil, after all).

Well, it might seem obvious, but I haven't though about adding the depth argument to a standard DFS algorithm. I feel sort of silly now. Anyway that solves the problem. Thank you a lot. — Eli Korvigo, Feb 13 '15 at 12:47

Compare two trees/graphs in Python

1 Answers1