Comparison function for DFS ordering of an arbitrary tree

Question

Imagine a complete binary tree where nodes at each level of depth are numbered left to right.

Node 1 has children 2 and 3.
Node 2 has children 4 and 5.
Node 3 has children 6 and 7.

etc.

At depth D there will be 2^D nodes, with numbers 2^D ... 2^(D+1)-1

The depth-first search traversal for a complete tree of any depth is deterministic.

For example, a tree of depth 4 will always be traversed: 1,2,4,8,9,5,10,11,3,6,12,13,7,14,15.

I am looking for a way to sort a list of numbers by where they would fall in the DFS traversal of any tree.

In particular, I would like a comparison function that could take two numbers and determine which comes first in the DFS traversal.

Any ideas?

Pre-computing the DFS traversal for some maximum tree size is one way to do this, but I would prefer a mathematical solution that doesn't require computing and storing that information.

There are some terminologies that you can use in place of what you've written: "ideal tree" -> "complete binary tree", "traversal .. is constant" -> "traversal .. is deterministic" — justhalf, Nov 15 '13 at 04:40

score 1 · Accepted Answer · edited Jun 20 '20 at 09:12

The algorithm with the best performance would be the one suggested by FUD, since you'll only need to traverse the tree once, and then the comparison will just be O(1).

But if you don't want to traverse the whole tree, and just want a comparator, there is a O(log n) comparator (which can be optimized to O(log log n), or practically O(1)).

The idea is:

Observation 1: If the two nodes are on the same depth, the higher numbered node will be traversed later.

Observation 2: If the two nodes are not on the same depth, by noting that parent is always visited first before descendants, we take the ancestor of the deeper node which is on the same depth as the more shallow node. Then compare using Observation 1.

Using your number system in the complete binary tree, you can get a parent of node n by taking n/2. So, after getting the depths of each node (can be done in O(log n), or precomputed), say d1 < d2, we divide the deeper node with 2^(d2-d1) (power can be done in O(log p), in this case p is O(log n), so it's O(log log n)). Then see which one is bigger =)

in C++:

// This method can be modified to be faster
// See: http://stackoverflow.com/questions/671815/what-is-the-fastest-most-efficient-way-to-find-the-highest-set-bit-msb-in-an-i
int depth(int n){
    int result=-1;
    for(;n>0; result++, n/=2);
    return result;
}

bool n1_before_n2(int n1, int n2){
    int d1 = depth(n1);
    int d2 = depth(n2);
    if(d1>d2) n1 >>= (d1-d2);
    if(d2>d1) n2 >>= (d2-d1);
    return n1<n2;
}

The depth of a node with number n you can get in O(1) with floor(log(2, n)). I'm not sure I follow your answer though - after we've found which one is deeper we divide by 2 to the power of the difference in depths? How would that compare two nodes of the same depth, e.g. 2 & 3? — logworthy, Nov 15 '13 at 05:03
`log(2,n)` is theoretically not `O(1)`, but ok, you can assume so. If you have nodes with the same depth, we use first observation, the higher numbered node is visited later. The division is done to take the ancestor of the deeper node. The ancestor will be on the same depth as the other node. — justhalf, Nov 15 '13 at 05:06

score 1 · Answer 2 · edited May 23 '17 at 10:31

Here's a C implementation of @Abhishek's answer:

//returns -1 if a before b; 0 if same; else 1
int treesort(unsigned int a,unsigned int b)
{
  int diff, swap=1, side=0;
  unsigned int ra, rb;
  if (a==b) return 0;  
  //ensure deeper node is always in b.
  if (b<a) { int t=a;a=b;b=t;swap=-1;}
  //treat 0 as before everything else
  if (a==0) return -1*swap;
  //clear all but the msb
  ra=base(a); rb=base(b);
  //move up to same level, tracking child side
  while (rb!=ra)  { side=b&1;rb/=2;b/=2; }
  //compare parents at same level
  diff = (b-rb)-(a-ra);
  //if same parent, answer depends on side.
  if (diff==0) diff = side;
  //restrict to [-1,1], be sure to handle swap
  return (diff>0?-1:1)*swap;
}

base is a function that clears all except the most significant bit. I tested with
int base(unsigned int x){return 1<<(msb32(x)-1);} using Sir Slick's msb32() from this question.

score 0 · Answer 3 · answered Nov 15 '13 at 04:26

0

You can pre-compute for a maximum level of depth in a complete tree. And assign each node a increasing set of value for e.g. in your depth 4 tree

v[1]=1, v[2]=3, v[4]=3 ...

Then the comparison function would just be

int cmp(i,j):
    return v[i]<v[j]

answered Nov 15 '13 at 04:26

FUD

5,114
7
39
61

Sure. I'd like a mathematical solution that doesn't involve coding a DFS algorithm nor storage that increases with the size of the tree. I will update the question to specify. – logworthy Nov 15 '13 at 04:54
DFS is inefficient for tree which have structure in them for example if in complete binary tree of 2^10 - 1 node to find index of 3 it need 2^9 traversals where there we know that in complete BT the left subtree has half the number of nodes of total non root nodes so we can easily derive a formula for it. – Vikram Bhat Nov 16 '13 at 03:27

score 0 · Answer 4 · answered Nov 15 '13 at 05:11

If you notice the pattern, you can try the following. I'm not sure, it may need some refinement but it gives some idea for you to probe further.

For each node, find the highest power of two that is less than the value. For example, if we are comparing 7 and 13, then for 7, it will be 4 and for 13 it will be 8.
Now calculate the difference between the values and their respective powers of 2. For 7, it will be 3 and for 13, it will be 5.
If the powers of two are the same, then the one with lower difference shall come before the one with the higher difference.
For our case, calculate the parent of 13 in the same row as 7. This will be 13 / 2^(difference of levels between 7 and 13).

Hence, parent of 13 = 13 / 2^1 = 6.
Since 6 < 7, we can say that 13 comes before 7.

EDIT: as suggested, the unnecessary step 3 deleted.

You don't need step 2 and 3. Step 4 and 5 will cover all cases. — justhalf, Nov 15 '13 at 05:13

Vikram Bhat · Answer 5 · 2013-11-16T03:18:30.793

Here is mathematical solution to the problem but i could not get the closed form for the equation : -

To find DFS index of node k in tree of total node N: -

DFS-order(k) = DFS-order((k-1)/2) + 2^(log(N+1) - log((k-1)/2) - 1)       if k is odd

DFS-order(k) = DFS-order(k/2) + 1             if k is even 

Base Case: DFS-order(1) = 0

You can find a closed form for the above equation which i think is higher level mathematics.

Explanation:-

For odd nodes we first traverse all left substree nodes of parent and plus 1 for new index. We know that left subtree has half the nodes of complete binary tree rooted at parent of node excluding parent. Total nodes in complete BT is 2^d - 2 excluding the root. Left subtree has half of it 2^(d-1) - 1. d is depth of tree root at parent. Depth of tree rooted at node k is Total depth - log(k). Total depth is log(N+1). Hence No of Nodes in left subtree is 2^(log(N+1) - log((k-1)/2) - 1) - 1. We add 1 for one more traversal fron parent to current node hence final sum = 2^(log(N+1) - log((k-1)/2) - 1). We add this to parent index to get current node index hence DFS-order(k) = DFS-order((k-1)/2) + 2^(log(N+1) - log((k-1)/2) - 1).

For Even node it is trivial that its index is parent index + 1

Note: Equation can find index of node k in O(log(k)) time & log function use floor operator to get discrete values.

Comparison function for DFS ordering of an arbitrary tree

5 Answers5