Subtrees of a tree

Question

I have a given tree with n nodes. The task is to find the number of subtrees of the given tree with outgoing edges to its complement less than or equal to a given number K.

for example: If n=3 and k=1

and the given tree is 1---2---3

Then the total valid subtrees would be 6

{}, {1}, {3}, {1,2}, {2,3}, {1,2,3}

I know I can enumerate all 2^n trees and chack the valid ones, but is there some approach that is faster? Can I achieve polynomial time in n? Something close to O(n^3) or even O(n^4) would be nice.

EDIT: for k=1 this value turns out to be 2*n

You are wrong here, please ask this question on http://math.stackexchange.com/ — Maurice, Jun 01 '13 at 11:39
@Bergi 1---2---3 is a tree mate. there are 3 nodes and 2 edges. There can be more complex trees, i took this one as an easy example. — Salena, Jun 01 '13 at 11:41
Why isn't the `{2}` being considered? Must the complement be a tree as well? — ypercubeᵀᴹ, Jun 01 '13 at 11:49
@ypercube because it has 2 outgoing edges to its complement tree, and we only consider subtrees with only upto k (i.e. 1 here) outgoing edges to the complement tree. No it is not necessary that the complement be a tree. — Salena, Jun 01 '13 at 11:50
You have something wrong there. The complement of `{2}` is `{1},{3}` which is not a tree. If the complement is not necessary to be a tree, correct the question with the proper term. — ypercubeᵀᴹ, Jun 01 '13 at 11:51
@ypercube yes i removed the term tree used for the complement. Thanks for pointing that out. — Salena, Jun 01 '13 at 11:54
For `k=1`, the number of subtrees is (obviously) `2*E+2` where `E` is the number of edges of the original tree. — ypercubeᵀᴹ, Jun 01 '13 at 11:57
@MauriceA. I'm not outright disagreeing, but does math.stackexchange.com concern themselves with computational complexity? Seems like it is also well suited to SO, to me. — Wayne Uroda, Jun 01 '13 at 11:57
@ypercube how can you say that? Since it is a tree we will have E+1 nodes. so you imply that we will have 2*N such subtrees without the need to go through the edges list. — Salena, Jun 01 '13 at 12:01
@ypercube We're talking about unrooted trees here, and "subtree" means a connected subgraph. — David Eisenstat, Jun 01 '13 at 12:03
@ypercube yes you are correct for the k==1 part. How did you infer this? something that can be used for furthur values of k? — Salena, Jun 01 '13 at 12:18
If you remove an edge, you get 2 distinct subtrees. Any subtree that has 1 "outgoing" edge to the complement will be one of them. The `+2` is from the empty set and the whole tree. — ypercubeᵀᴹ, Jun 01 '13 at 12:20
For `k=2`, I think it is `E*(E-1)/2 + 2*E + 2`. But I don't think it can be generalized for bigger k. Not easily at least. — ypercubeᵀᴹ, Jun 01 '13 at 12:23
@Abhishek: I know that lists are trees, only the example was too trivial and I had expected something demonstrating the expected result for a complex example :-) Btw, how do you have your trees/graphs stored (adjacency list/matrix/something else)? Runtime of algorithm depends on that quite heavily… — Bergi, Jun 01 '13 at 12:24

score 3 · Answer 1 · answered Jun 01 '13 at 12:53

3

This is a fairly typical instance of the DP-on-a-tree paradigm. Let's generalize the problem slightly by allowing the specification of a root vertex v and stratifying the counts of the small-boundary trees in two ways: whether v is included, and how many edges comprise the boundary.

The base case is easy. There are no edges and thus two subtrees: one includes v, the other excludes v, and both have no boundary edges. Otherwise, let e = {v, w} be an edge incident to v. The instance looks like this.

|\         /|
| \   e   / |
|L v-----w R|
| /       \ |
|/         \|

Compute recursively the stratified counts for L rooted at v and R rooted at w.

Subtrees that include v consist of a subtree in L that includes v, plus optionally e and a subtree in R that includes w. Subtrees that don't include v consist of either a subtree in L that doesn't include v, or a subtree in R (double counting the empty tree). This means we can obtain the stratified counts by convolving the stratified counts for L with the stratified counts for R.

Here's how this works on your example. Let's choose root 1.

  e
1---2---3

We choose e as shown and recurse.

The vector for includes-1 is [1], since the one subtree is {1}, with no boundary. The vector for excludes-1 is [1], since the one subtree is {}, also with no boundary.

2---3

We compute 2 and 3 as we did for 1. The vector for includes-2 is [1, 1], since {2, 3} has no boundary edges, and {2} has one. We obtained this vector by adding the includes-2 vector for 2, shifted by one because of the new boundary edge to make [0, 1], to the convolution of the includes-2 vector for 2 with the includes-3 vector for 3, which is [1, 0]. The vector for excludes-2 is [1] + [1, 1] - [1] = [1, 1], where [1, 1] is the sum of the shifted includes-3 vector and the excludes-3 vector, and the subtraction is to compensate for double-counting {}.

Now, for the original invocation, to get the includes-1 vector, we add [0, 1], the includes-1 vector for 1 shifted by one, to the convolution of [1] with [1, 1], obtaining [1, 2]. To check: {1, 2, 3} has no boundary, and {1} and {1, 2} have one boundary edge. The excludes-1 vector is [1] + [1, 2, 1] - [1] = [1, 2, 1]. To check: {} has no boundary, {2, 3} and {3} have one boundary edge, and {2} has two boundary edges.

answered Jun 01 '13 at 12:53

David Eisenstat

64,237
7
60
120

I don't get you. Could you explain it in a more simpler way? – Salena Jun 01 '13 at 12:59
2

@Abhishek No, I can't. Please spend more than 7 minutes thinking about it. – David Eisenstat Jun 01 '13 at 13:00
Any particular source wich could help me in understanding the include/exclude vectors? – Salena Jun 01 '13 at 13:13
@DavidEisenstat Could you recommend specific material regarding dynamic programming? For some reason it wasn't covered in my undergrad degree and won't be in my master's, either; I'd greatly appreciate suggestions, if you have any off the top of your head. – G. Bach Jun 01 '13 at 20:57
@G.Bach Dynamic programming is usually taught in an intro algorithms course – is there none available to you in your master's? I don't have any recommendations other than to take advantage of the large variety of problems amenable to dynamic programming by practicing a lot. – David Eisenstat Jun 01 '13 at 21:24
@DavidEisenstat Can you please elaborate your solution a bit. I spent a lot of time understanding it, but couldn't. I understand the base case and that you break down the graph at 'e', but from thereon it is confusing. I dont understand how you combine the subproblems. Also how you choose 'e'? is it random? Also the term boundary? does it imply an outgoing edge to complement? And the vector shifting, how is that done? – Salena Jun 01 '13 at 21:33
@DavidEisenstat I looked through the material on the introductory course on algorithms as well as the two master courses on advanced algorithms. None have sections on dynamic programming, while of course Dijkstra's algorithm and the Bellman-Ford algorithm were covered. – G. Bach Jun 01 '13 at 21:34
1

@G.Bach There is a nice SO answer, http://stackoverflow.com/questions/6184869/what-is-difference-between-memoization-and-dynamic-programming. The excellent [Algorithm Design Manual](http://www.amazon.com/Algorithm-Design-Manual-Steven-Skiena/dp/1848000693) covers it in detail. To quote, "DP efficiently implements a recursive algorithm by storing partial results. The trick is seeing whether the naive algorithm computes the same subproblems over and over and over again. If so, storing the answer for each subproblems in a table to look up instead of recompute can lead to an efficient algorithm." – TooTone Jun 02 '13 at 10:05
@TooTone Thanks, I'll have a look at those! – G. Bach Jun 02 '13 at 11:01
@DavidEisenstat what do you mean by the term convolution in the second last and last paragraph of your answer _"to the convolution of [1] with [1, 1]"_ how do you process these two vectors? – Salena Jun 03 '13 at 23:05
can u please tell what you are storing in the vector and how you are taking count of boundary edges – kushpf Jun 06 '13 at 16:56

score 2 · Accepted Answer · answered Jul 09 '13 at 19:09

Here is my python implementation of David Eisenstat's solution:

from sys import stdin
from numpy import *
from scipy import *

def roundup_pow2(x):
  """
  Round up to power of 2 (obfuscated and unintentionally faster :).
  """
  while x&(x-1):
    x = (x|(x>>1))+1
  return max(x,1)

def to_long(x):
    return long(rint(x))

def poly_mul(a,b):
  n = len(a) + len(b) - 1
  nr = roundup_pow2(n)
  a += [0L]*(nr-len(a))
  b += [0L]*(nr-len(b)) # pad with zeros to length n
  u = fft(a)
  v = fft(b)
  w = ifft(u*v)[:n].real             # ifft == inverse fft
  return map(to_long,w)

def pad(l,s) :
    return l+[0L]*(s-len(l))

def make_tree(l,x,y):
    l[x][y]=y
    l[x].pop(y)
    for child in l[x]:
        make_tree(l,child,x)

def cut_tree(l,x) :
    if len(l[x])==0:
        return [1L],[1L]
    y,_ = l[x].popitem()
    ai,ax=cut_tree(l,x)
    bi,bx=cut_tree(l,y)

    ci=[0L]+ai
    tmp=poly_mul(ai,bi)
    padlen=max(len(ci),len(tmp))
    ci=pad(ci,padlen)
    tmp=pad(tmp,padlen)
    ci=map(add,ci,tmp)

    cx=[0L]+bi
    padlen=max(len(cx),len(bx),len(ax))
    cx=pad(cx,padlen)
    bx=pad(bx,padlen)
    ax=pad(ax,padlen)
    tmp=pad([-1],padlen)
    cx=map(add,cx,bx)
    cx=map(add,cx,ax)
    cx=map(add,cx,tmp)

    return ci,cx



n,k = map(int,raw_input().split())
l=[{}]
for i in range(1,n+1):
    d={}
    l.append(d)
for i in range(1,n):
    x,y = map(int,raw_input().split())
    l[x][y]=y
    l[y][x]=x
make_tree(l,1,0)

i,x = cut_tree(l,1)
padlen=max(len(i),len(x))
i=pad(i,padlen)
x=pad(x,padlen)
combined=map(add,i,x)
sum=0L
for i in range(0,k+1) :
    sum+=combined[i]

print sum

score 0 · Answer 3 · answered Jul 27 '18 at 13:49

Let us create a slightly bigger tree like below.

Let us define a function F(a, k) for each node 'a' with 'k' edges removed from node 'a' and below. i.e. if 'k' edges are removed from node 'a' then we create F(a, k) number of subtrees. (If 'a' is not root, it is assumed to be connected to it's parent).

e.g. in above tree ( F(4, 1) = 2 ), as we create 2 trees by removing 2 edges below '4' (we assume that 4 is connected to parent and subtrees (5) and (6) are not counted in F(4,1))

We traverse and calculate 'F' of each child first. Then using child's F we calculate parents F.

F(a, k) of a leaf node is '0' for all k

For non-leaf nodes.

F(a, k) = SUM (F(child, k)) + Z

While F(child, k) can be calculated recursively.

Z on the other hand is calculated by finding all combinations where some child take ri edges out of k such that SUM(ri) = k

Programmatically this can be done by fixing 'j' edge for a given child and then calculating the number of trees created by distributing 'k-j' edges to other children.

e.g. in above tree

F(1, 3) = F(2, 3) + F(3, 3) + F(4, 3) +  // we pass k as-is to child  
      F(2,1)*F(3,1)*F(4,1) + F(2,1)*F(3,2) + F(2,1)*F(4,2) + //consume 1 edge by 2 and distribute 2 to other children
      F(2, 2)*F(3,1) + F(2,2)*F(4,1) + // consume 2 edges from node '2' and 1 for other children
      F(3,1)*F(4,2)

As we see above, we fix 'r' edge for node 2 and then distribute '3-r' edges to other children. We keep doing this for all children of '1'.

Additionally, we create sub-trees when we detach a node from parent. e.g. in above case when we calculate F(1, 3) we create the following detached trees. detached_tree += F(2, 2) + F(3, 2) + F(4, 2)
Here we assume that one edge is consumed by detaching child node from parent, and in child node if we consume 'k-1' edges we will create F(child, k-1) subtrees. These trees are counted and stored seperately in detached_trees.

Once we have calculated the F(a,k) of all nodes.

The total subtrees are 'SUM(F(root, k)) for all k' + 'total nodes - 1' + detached_trees.

We add 'total nodes - 1' to our total. This is because when a node (except root) is detached from a tree, it creates two trees with 1 edge missing. While one of the tree is counted in F(parent, 1), the other is not counted anywhere, hence needs to be counted in total.

Here is C code of above algorithm. The recursion can be further optimized.

#define MAX 51
/* We use the last entry of alist to store number of children of a given node */
#define NUM_CHILD(alist, node) (alist[node][MAX])
int alist[MAX][MAX+1] = {0};
long F[MAX][MAX]={0};
long detached_subtrees = 0;

/*
 *  We fix one of the child node for 'i' edges out of 'n', then we traverse
 *  over  the rest of the children to get 'n-i' edges, we do so recursivly.
 *  Note that if 'n' is 1, we can always build a subtree by detaching.
 */
long REST_OF_NODES_SUM(int node, int q, int n)
{
    long sum = 0, i, node2, ret = 0, nd;

    /* fix node2 and calcualte the subtree for rest of the children */
    for(nd = q; nd < NUM_CHILD(alist, node); nd++) {
            node2 = alist[node][nd];
            /* Consume 'i' edges and send 'n-i' for other children of node */
            for (i = 1; i < n ; i++) {
                    sum = REST_OF_NODES_SUM(node, nd + 1, n - i);
                    ret += (F[node2][i] * sum);
                    /* Add one for 'node2' getting detached from tree */
                    if (i == 1) { ret += sum; }
            }
            ret += F[node2][n];
            /* If only one edge is to be consumed, we detach 'node2' from the tree */
            if (n == 1) { ret++; }
    }

    return ret;
}

void get_counts(int N, int K, int node, int root)
{
    int child_node;
    int i, j, p, k;

    if (NUM_CHILD(alist, node) == 0) { return; }

    for(i = 0 ; i < NUM_CHILD(alist, node); i++) {
            child_node = alist[node][i];
            /* Do a recursive traversal of all children */
            get_counts(N, K, child_node, node);
            F[node][1] += (F[child_node][1]);
    }

    F[node][1] += NUM_CHILD(alist, node);

    for (k = 2; k <= K; k++) {
            for(p = 0; p < NUM_CHILD(alist, node); p++) {
                    child_node = alist[node][p];
                    F[node][k] += F[child_node][k];
                    /* If we remove this child, then we create subtrees in the child */
                    detached_subtrees += F[child_node][k-1];

                    /* Assume that 'child_node' is detached, find tree created by rest
                     * of children for 'k-j' edges */
                    F[node][k] += REST_OF_NODES_SUM(node, p + 1, k - 1);

                    /* Fix one child node for 'j' edges out of 'k' and traverse over the rest of
                     * children for 'k - j' edges */
                    for (j = 1; j < k ; j++) {
                            if (F[child_node][j]) F[node][k] += (F[child_node][j] * REST_OF_NODES_SUM(node, p + 1, k - j));
                    }
            }
    }
}


void remove_back_ref(int parent, int node)
{
    int c;
    for (c = 0; c < NUM_CHILD(alist, node); c++) {
            if (alist[node][c] == parent) {
                    if ((c + 1) == NUM_CHILD(alist, node)) {
                            NUM_CHILD(alist, node)--;
                            alist[node][c] = 0;
                    } else {
                            /* move last entry here */
                            alist[node][c] = alist[node][NUM_CHILD(alist, node)-1];
                            alist[node][NUM_CHILD(alist, node)-1] = 0;
                            NUM_CHILD(alist, node)--;
                    }
            }
    }
}

/* go to each child and remove back links */
void normalize(int node)
{
    int j, child;

    for (j = 0; j < NUM_CHILD(alist, node); j++) {
            child = alist[node][j];
            remove_back_ref(node, child);
            normalize(child);
    }
}

long cutTree(int N, int K, int edges_rows, int edges_columns, int** edges)
{
    int i, j;
    int node, index;
    long ret = 0;

    /* build an adjacency list from the above edges */
    for (i = 0; i < edges_rows; i++) {
            alist[edges[i][0]][NUM_CHILD(alist, edges[i][0])] = edges[i][1];
            alist[edges[i][1]][NUM_CHILD(alist, edges[i][1])] = edges[i][0];
            NUM_CHILD(alist, edges[i][0])++;
            NUM_CHILD(alist, edges[i][1])++;
    }

    /* get rid of the back links in children */
    normalize(1);
    get_counts(N, K, 1, 1);
    for (i = 1; i <= K; i++) { ret += F[1][i]; }
    /* Every node (except root) when detached from tree, will create one extra subtree. */
    ret += (N - 1);
    /* The subtrees created by detaching from parent */
    ret += detached_subtrees;
    /* Add two for empty and full tree */
    ret += 2;

    return ret;
}

main(int argc, char *argv[])
{
    int **arr;
    int ret, i, N, K, x, y;

    scanf("%d%d", &N, &K);
    arr = malloc((N - 1) * sizeof(int*));
    for (i = 0; i < (N - 1); i++) { arr[i] = malloc(2*sizeof(int)); }

    for (i = 0; i < N-1; i++) { scanf("%d%d", &x, &y); arr[i][0] = x; arr[i][1] = y; }

    printf("MAX %d ret %ld\n", MAX, cutTree(N, K, N-1, 2, arr));
}

Subtrees of a tree

3 Answers3