18

I have a tree data structure that is L levels deep each node has about N nodes. I want to work-out the total number of nodes in the tree. To do this (I think) I need to know what percentage of the nodes that will have children.

What is the correct term for this ratio of leaf nodes to non-leaf nodes in N?

What is the formula for working out the total number nodes in the three?

Update Someone mention Branching factor in one of the answer but it then disappeared. I think this was the term I was looking for. So shouldn't a formula take the branching factor into account?

Update I should have said an estimate about a hypothetical datastructure, not the exact figure!

tpower
  • 56,100
  • 19
  • 68
  • 100
  • I took out branching factor because that's the term for what you've called N. I then realised you were looking for the ratio of leaf to inner nodes. – Mark Pim Feb 05 '09 at 10:21

6 Answers6

33

Ok, each node has about N subnodes and the tree is L levels deep.

With 1 level, the tree has 1 node.
With 2 levels, the tree has 1 + N nodes.
With 3 levels, the tree has 1 + N + N^2 nodes.
With L levels, the tree has 1 + N + N^2 + ... + N^(L-1) nodes.

The total number of nodes is (N^L-1) / (N-1).

Ok, just a small example why, it is exponential:

                    [NODE]
                      |
                     /|\
                    / | \
                   /  |  \
                  /   |   \
            [NODE]  [NODE] [NODE]
              |
             /|\
            / | \
Toon Krijthe
  • 52,876
  • 38
  • 145
  • 202
  • Need a correction: [..] and the tree is _L_ (was N) levels deep. +1 anyway :) – Andrea Ambu Feb 05 '09 at 10:22
  • 3
    +1, but it wouldn't hurt to spell out the algebra showing why 1+N+N^2+...+N^L = (N^L-1)/(N-1). – j_random_hacker Feb 05 '09 at 10:25
  • Edit: I made Andrea Ambu's suggested change and also fixed a typo. – j_random_hacker Feb 05 '09 at 10:33
  • In fact your solution to the equation is a bit off - it should be simply (N^L)-1 (imagine if N were 2 and L were 3 - the sum is 1+2+4=7=(2^3)-1). This can be shown fairly easily through induction. – HenryR Feb 05 '09 at 10:37
  • 1
    @HenryR: I derived Gamecat's formula independently. If N=2, then your proposed answer is *also* correct as dividing by (N-1) doesn't change anything in this case. It's when N>2 that you need to perform the division to get a correct answer. – j_random_hacker Feb 05 '09 at 10:43
16

Just to correct a typo in the first answer: the total number of nodes for a tree of depth L is (N^(L+1)-1) / (N-1)... (that is, to the power L+1 rather than just L).

This can be shown as follows. First, take our theorem:

1 + N^1 + N^2 + ... + N^L = (N^(L+1)-1)/(N-1)

Multiply both sides by (N-1):

(N-1)(1 + N^1 + N^2 + ... + N^L) = N^(L+1)-1.

Expand the left side:

N^1 + N^2 + N^3 + ... + N^(L+1) - 1 - N^1 - N^2 - ... - N^L.

All terms N^1 to N^L are cancelled out, which leaves N^(L+1) - 1. This is our right hand side, so the initial equality is true.

  • This won't be true for none-binary tree. For example, if each node has 8 child. 8^2-1 = 63. for depth of 1 – ZpfSysn Apr 28 '19 at 04:20
  • 1
    @aDev You forgot to divide by (N-1), or 7 in this case, which will result in a final answer of 9, which is correct for N=8, L=1. – David R. Oct 15 '19 at 16:50
1

The formula for calculating the amount of nodes in depth L is: (Given that there are N root nodes)

NL

To calculate the number of all nodes one needs to do this for every layer:

for depth in (1..L)
    nodeCount += N ** depth

If there's only 1 root node, subtract 1 from L and add 1 to the total nodes count.

Be aware that if in one node the amount of leaves is different from the average case this can have a big impact on your number. The further up in the tree the more impact.

     *                *                 *         N ** 1
    ***              ***               ***        N ** 2
*** *** ***      *** *** ***       *** *** ***    N ** 3

This is community wiki, so feel free to alter my appalling algebra.

Georg Schölly
  • 124,188
  • 49
  • 220
  • 267
David Grant
  • 13,929
  • 3
  • 57
  • 63
1

If your tree is approximately full, that is every level has its full complement of children except for the last two, then you have between N^(L-2) and N^(L-1) leaf nodes and between N^(L-1) and N^L nodes total.

If your tree is not full, then knowing the number of leaf nodes doesn't help as a totally unbalanced tree will have one leaf node but arbitrarily many parents.

I wonder how precise your statement 'each node has about N nodes' is - if you know the average branching factor, perhaps you can compute the expected size of the tree.

If you are able to find the ratio of leaves to internal nodes, and you know the average number of children, you can approximate this as (n*ratio)^N = n. This won't give you your answer, but I wonder if someone with better maths than me can figure out a way to interpose L into this equation and give you something soluble.

Still, if you want to know precisely, you must iterate over the structure of the tree and count nodes as you go.

HenryR
  • 8,219
  • 7
  • 35
  • 39
1

Knuth's estimator [1],[2] is a point estimate that targets the number of nodes in an arbitrary finite tree without needing to go through all of the nodes and even if the tree is not balanced. Knuth's estimator is an example of an unbiased estimator; the expected value of Knuth's estimator will be the number of nodes in the tree. With that being said, Knuth's estimator may have a large variance if the tree in question is unbalanced, but in your case, since each node will have around N children, I do not think the variance of Knuth's estimator should be too large. This estimator is especially helpful when one is trying to measure the amount of time it will take to perform a brute force search.

For the following functions, we shall assume all trees are represented as lists of lists. For example, [] denotes the tree with the single node, and [[],[[],[]]] will denote a tree with 5 nodes and 3 leaves (the nodes in the tree are in a one-to-one correspondence with the left brackets). The following functions are written in the language GAP.

The function simpleestimate gives an output an estimate for the number of nodes in the tree tree. The idea behind simpleestimate is that we randomly choose a path x_0,x_1,...,x_n from the root x_0 of the tree to a leaf x_n. Suppose that x_i has a_i successors. Then simpleestimate will return 1+a_1+a_1*a_2+...+a_1*a_2*…*a_n.

point:=tree; prod:=1; count:=1; list:=[]; 
while Length(point)>0 do prod:=prod*Length(point); count:=count+prod; point:=Random(point); od;
return count; end;

The function estimate will simply give the arithmetical mean of the estimates given by applying the function simpleestimate(tree) samplesize many times.

estimate:=function(samplesize,tree) local count,i; 
count:=0; 
for i in [1..samplesize] do count:=count+simpleestimate(tree); od; 
return Float(count/samplesize); end;

Example: simpleestimate([[[],[[],[]]],[[[],[]],[]]]); returns 15 while estimate(10000,[[[],[[],[]]],[[[],[]],[]]]); returns 10.9608 (and the tree actually does have 11 nodes).

  1. Estimating Search Tree Size. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.129.5569&rep=rep1&type=pdf

  2. Estimating the Efficiency of Backtrack Programs. Donald E. Knuth http://www.ams.org/journals/mcom/1975-29-129/S0025-5718-1975-0373371-6/S0025-5718-1975-0373371-6.pdf

0

If you know nothing else but the depth of the tree then your only option for working out the total size is to go through and count them.

Mark Pim
  • 9,898
  • 7
  • 40
  • 59
  • Actually he also knows the average, or expected, number of children per node. This plus the height of the tree is enough information to work out the average, or expected, number of nodes in the tree. – j_random_hacker Feb 05 '09 at 10:18
  • Yep, that was my mistake - I read the question as needing to know the exact size of the tree. – Mark Pim Feb 05 '09 at 10:22