How to find the set of trees every one of which spans over another given tree?

Question

Imagine it's given a set of trees ST and each vertex of every tree is labeled. Also another tree T is given (also with labels vertices). The question is how can I find which trees of the ST can span over the tree T starting from the root of T in such a way that the labels of the vertices of the spanning tree T' coincide with those labels of T 's vertices. Note that the children of every vertex of T should be either completely covered or not covered at all - partial covering of children is not allowed. Stated in other words: Given a tree and the following procedure: pick a vertex and remove all vertices and edges below this vertex (except the vertex itself). Find those trees of ST such that each tree is generated with a series of procedures applied to T. For example given the tree T

enter image description here

the trees

enter image description here

cover T and the tree

enter image description here

does not because this tree has children 3, 5 unlike T which has 2, 3 as children. The best thing I was able to think of was either to brute force it or to find the set of tree every one of which has the same root label as T and then to search for the answer among those trees but I guess neither of those two approaches is the optimal one. I was thinking of somehow hashing the trees but nothing came out. Any thoughts?

Notes:

The trees are not necessarily binary
A tree T can cover another tree T' if they share a root
The tree is ordered meaning that you cannot swap the position of any two children.

TL; DR Find a efficient algorithm which on query with given tree T the algorithm finds all trees from a given(fixed/static) set ST which are able to cover T.

Can you flesh out your explanation of why your examples do and do not satisfy your problem constraints? — Richard, Sep 23 '13 at 14:47
Also, do the nodes of your trees always have at most 2 children? — Richard, Sep 23 '13 at 14:54
Do I have this right? You are given many **T** that you want to check against **ST**. The "brute force" method you speak of would check **T** against **ST** on a per-node basis, but if **ST** is large, this would potentially require many computations - so you want a way to avoid accessing the nodes of **ST** for every **T**? — Richard, Sep 23 '13 at 15:01
@Richard Exactly! My query is the form of given **T** find those tree from **ST** each of which covers **T**. You may say that **ST** is fixed(static) and on each query **T** is received and trying to avoid as much computations as possible. — sve, Sep 23 '13 at 15:06
Got it - I can see a way to do this, but I haven't quite worked out the algorithm yet... — Richard, Sep 23 '13 at 15:10
Another question: for a **T** to match a member **s** of **ST** do they have to share a root, or is it sufficient for a **T** to cover a sub-tree of **s**? — Richard, Sep 23 '13 at 15:19
Another question: should the order of the children of a given node be considered interchangeable? That is, the trees `(1, (2, 3))` and `(1, (3, 2))` are the same? — Richard, Sep 23 '13 at 15:59
@Richard Every node's children are ordered meaning you cannot swap the position of any two. Those trees are different. — sve, Sep 23 '13 at 16:01

score 1 · Accepted Answer · edited May 23 '17 at 11:53

I'll sketch an answer and then provide some working source code.

First off, you need an algorithm to hash a tree. We can assume, without loss of generality, that the children of each of your tree's nodes are ordered from least to greatest (or vice versa).

Run this algorithm on every member of ST and save the hashes.

Now, take your test tree T and generate all of its subtrees TP that retain the original root. You can do this (perhaps inefficiently) by:

Making a set S of its nodes
Generating the power set P of S
Generating the subtrees by removing the nodes present in each member of P from copies of T
Adding those subtrees which retain the original root to TP.

Now generate a set of all of the hashes of TP.

Now check each of your ST hashes for membership in TP.

ST hash storage requires O(n) space in ST, and possibly the space to hold the trees.

You can optimize the membership code so that it requires no storage space (I have not done this in my test code). The code will require approximately 2N checks, where N is the number of nodes in **T.

So the algorithm runs in O(H 2**N), where H is the size of ST and N is the number of nodes in T. The best way of speeding this up is to find an improved algorithm for generating the subtrees of T.

The following Python code accomplishes this:

#!/usr/bin/python
import itertools
import treelib
import Crypto.Hash.SHA
import copy

#Generate a hash of a tree by recursively hashing children
def HashTree(tree):
  digester=Crypto.Hash.SHA.new()

  digester.update(str(tree.get_node(tree.root).tag))

  children=tree.get_node(tree.root).fpointer
  children.sort(key=lambda x: tree.get_node(x).tag, cmp=lambda x,y:x-y)

  hash=False
  if children:
    for child in children:
      digester.update(HashTree(tree.subtree(child)))
    hash = "1"+digester.hexdigest()
  else:
    hash = "0"+digester.hexdigest()

  return hash

#Generate a power set of a set
def powerset(iterable):
  "powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"
  s = list(iterable)
  return itertools.chain.from_iterable(itertools.combinations(s, r) for r in range(len(s)+1))

#Generate all the subsets of a tree which still share the original root
#by using a power set of all the tree's nodes to remove nodes from the tree
def TreePowerSet(tree):
  nodes=[x.identifier for x in tree.nodes.values()]
  ret=[]
  for s in powerset(nodes):
    culled_tree=copy.deepcopy(tree)
    for n in s:
      try:
        culled_tree.remove_node(n)
      except:
        pass
    if len([x.identifier for x in culled_tree.nodes.values()])>0:
      ret.append(culled_tree)

  return ret

def main():
  ST=[]

  #Generate a member of ST
  treeA = treelib.Tree()
  treeA.create_node(1,1)
  treeA.create_node(2,2,parent=1)
  treeA.create_node(3,3,parent=1)
  ST.append(treeA)

  #Generate a member of ST
  treeB = treelib.Tree()
  treeB.create_node(1,1)
  treeB.create_node(2,2,parent=1)
  treeB.create_node(3,3,parent=1)
  treeB.create_node(4,4,parent=2)
  treeB.create_node(5,5,parent=2)
  ST.append(treeB)

  #Generate hashes for members of ST
  hashes=[(HashTree(tree), tree) for tree in ST]

  print hashes

  #Generate a test tree
  T=treelib.Tree()
  T.create_node(1,1)
  T.create_node(2,2,parent=1)
  T.create_node(3,3,parent=1)
  T.create_node(4,4,parent=2)
  T.create_node(5,5,parent=2)
  T.create_node(6,6,parent=3)
  T.create_node(7,7,parent=3)

  #Generate all the subtrees of this tree which still retain the original root
  Tsets=TreePowerSet(T)

  #Hash all of the subtrees
  Thashes=set([HashTree(x) for x in Tsets])

  #For each member of ST, check to see if that member is present in the test
  #tree
  for hash in hashes:
    if hash[0] in Thashes:
      print [x for x in hash[1].expand_tree()]

main()

n. m. could be an AI · Answer 2 · 2013-09-23T18:50:48.827

0

To verify that one tree covers another, one must look at all vertices of the first tree at least once. It is trivial to verify that a tree covers another by looking at all vertices of the first tree exactly once. Thus the simplest possible algorithm is already optimal, if it's only needed to check one tree.

Everything below are untested fruits of my sick imagination.

If there are many possible T that must be checked against the same ST, then it's possible to store trees of ST as sets of facts like these

root = 1
children of node 1 = (2, 3)
children of node 2 = ()
children of node 3 = ()

These facts can be stored in a standard relational DB in two tables, "roots" (fields "tree" and rootnode") and "branches" (fields "tree", "node" and "children"). then an SQL query or a series of queries can be built to find matching trees quickly. My SQL-fu is rudimentary so I could not manage it in a single query, but I'm believe it should be possible.

edited Sep 23 '13 at 18:50

answered Sep 23 '13 at 14:59

n. m. could be an AI

112,515
14
128
243

I'm not sure that the brute force algorithm is the optimal one. Lets consider the set of trees ST = {(1), (2), ...., (2 ^ 30)} and lets query for huge number of times the tree (1 (2)). In this situation on each query you would have to check all 2 ^ 30 trees. But for example if you hash the roots of trees ST, i.e. to have a dictionary d and d[k] gives you all the trees in ST with a root k, it will be constant time query. – sve Sep 23 '13 at 16:23
You should have said what's the fixed part of the problem and what's the variable part you are averaging over. It's not clear at all from the question. See update. – n. m. could be an AI Sep 23 '13 at 18:52

How to find the set of trees every one of which spans over another given tree?

2 Answers2