0

enter image description here

Ignoring space complexity, assuming each node in the tree is touched exactly once and considering DFS and BFS traversal time equivalent, what is the time complexity of traversing an n-ary tree?

Given that Big O notation is an asymptotic measure, meaning that we are looking a function that gives us a line or curve that best fits the problem as it is extended to more levels or branches.

My intuition tells me that for tree structures in general we would want a function of the sort b^l where b is the branching factor and l is the number of levels in the tree (for a full and complete tree).

However for a partial tree, it would make sense to take some sort of average of b and l, perhaps AVG(b)^AVG(l).

In looking for this answer I find many people are saying it is O(n) where n is the number of verticies in the tree (nodes -1). See: What is the time complexity of tree traversal? and Complexity of BFS in n-ary tree

But a linear solution in my mind does not model the cost (in time) that the algorithm will take as the tree adds additional levels (on average). Which is what I understand Big O notation is intended to do.

janst
  • 102
  • 10
  • A traversal of a tree touches every node in the tree once, by definition. There are N nodes, so the time complexity is O(N). If your traversal is inefficient and visits nodes multiple times, then the complexity might be worse. Fitting this into your intuition, you're trying to find an equation for `N` in terms of branches and height. You're looking for something like https://en.wikipedia.org/wiki/Faulhaber%27s_formula. You are correct that N (at least for a full b-ary tree) is in the form `N = 1 + b + b^2 + b^3 + ... + b*(l-1)`. – Welbog May 05 '22 at 15:30
  • Thanks for the comment, i shall clarify my question by assuming each node is touched only once – janst May 05 '22 at 15:39

1 Answers1

1

The height or branching factor of a tree are not the determining factors for the complexity of a complete traversal (whether it be BFS or DFS). It is only the number of vertices that is the determining factor.

If you want to express the complexity in terms of branching factor and height of the tree, then you are really asking what the relation is between these measures and the number of nodes in a tree.

As you already indicate, if the branching factor is 100 but it is only rarely that a node has more than 3 children, then the branching factor is not really telling us much. The same can be said about the height of the tree. A tree of height 4 and branching factor 2 can have between 5 and 31 nodes.

What to do then? Often, the worst case will be taken (maximising the number of nodes). This means that you'll translate branching factor and height to a tree that is perfect, i.e. where each node has the maximum number of children, except for the leaves of the tree, which are all on the same level.

The number of nodes is then ℎ+1-1, where is the branching factor, and ℎ the height (number of edges on the longest path from root to leaf).

That means the worst case time complexity for a given and ℎ is O(ℎ+1)

Working with average is not really practical. The distribution of the branching factor may not be linear, and the distribution of leaf-depths might not be linear either, so working with averages is not going to give much insight.

As to the cost of adding a node: Once the insertion point is determined, the time complexity for adding a node is constant. It does not matter which that insertion increases the branching factor or increases the height of the tree.

There is some variation when it comes to finding a node if the tree is a search tree (like a BST or B-tree). In that case the height of the tree becomes important, and a search would cost O(ℎ). If however the tree is self-balancing (like AVL or B-tree), this variation is limited and this complexity would be O(log)

trincot
  • 317,000
  • 35
  • 244
  • 286
  • Thanks for your thoughtful answer. I still imagine the graphs O(n) and O(avg of b^avg of L) to be equally viable for the purposes of a asymptote for measuring the time it takes to traverse this tree, granted the terms are different. But i do see the simplicity now of thinking of this in terms of how much more time it would take to double the number of nodes. Twice as long in this case. Or even the cost of adding one more node as you state: constant increase in time. – janst May 06 '22 at 04:22
  • If you could clarify what you mean by the averages not being practical i would appreciate the insight. – janst May 06 '22 at 04:30
  • I wonder if there is some requirement in Big O notation that you should always put things in terms of the most obvious complexity variable. (nodes in this case) Because I feel like there is a stigma of that sort in the computer science world. – janst May 06 '22 at 04:46
  • Thinking about it some more, linear would fit it better than the averages in most cases. The averages would occasionally fit as well as the linear, so this might be what you were hinting at. – janst May 06 '22 at 05:00
  • Concerning average: we would then need to be precise as to what we mean: would it be an *estimate*, or would really the average of each node's branching factor (i.e. their number of children) need to be taken. In the latter case, if we would do it literally, the sum of these branching factors would equal the number of edges, and averaging that per node would asymptotically lead to ... 1. Which is useless. – trincot May 06 '22 at 05:12
  • assuming the branching average branching factor was >= 2, I don't see it asymptotically leading to 1. – janst May 06 '22 at 05:55
  • If a node has a branching factor of 2, it means it has two children. So the sum of all individual branching factors in a tree corresponds to the number of edges (realise that leaves have a branching factor of 0). The number of edges in a tree is always one less than the number of nodes (this is a property of trees). So the average (per node) is (n-1)/n when n is the number of nodes. – trincot May 06 '22 at 05:57
  • I was not including the leaf nodes in the divisor. – janst May 06 '22 at 06:07
  • 1
    This illustrates why a specific definition of "average" is needed. – trincot May 06 '22 at 06:08