3

I'm learning "Weighted quick-union with path compression" algorithm for an union/find structure. The Princeton edu site explained detailedly the algorithm. And here is the implementation in Java:

public class WQUPC {
  private int[] id;
  private int[] sz;
  public WQUPC(int N) {
    id = new int[N];
    sz = new int[N];
    for (int i = 0; i < N; i++) {
      id[i] = i;
      sz[i] = 1;
    }
  }

  int root(int i) {
    while (i != id[i]) {
      id[i] = id[id[i]];
      i = id[i];
    }
    return i;
  }

  boolean connected(int p, int q) { return root(p) == root(q); }

  void union(int p, int q) {
    int i = root(p);
    int j = root(q);
    if (sz[i] < sz[j]) {
      id[i] = j;
      sz[j] += sz[i];
    } else {
      id[j] = i;
      sz[i] += sz[j];
    }
  }
}

But just like the website mentions about its performance:

Theorem: Starting from an empty data structure, any sequence of M union and find operations on N objects takes O(N + M lg* N) time.

• Proof is very difficult.

• But the algorithm is still simple!

But I'm still curious about how comes the iterative logarithm lg*n. How is it derived? Can someone prove it or just explain it in an intuitive way?

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
pjpj
  • 441
  • 1
  • 4
  • 20

2 Answers2

5

To begin with, your question has a slight mistake: the complexity with just path compression is only O(m log(n)) (without the iterated log). See, for example, exercise 21-4.4 in Introduction To Algorithms. In fact, you block

    if (sz[i] < sz[j]) {
      id[i] = j;
      sz[j] += sz[i];
    } else {
      id[j] = i;
      sz[i] += sz[j];
    }

does union by rank.

With both union by rank and path compression, though, the expression you used can be proved easily (much more easily than the inverse Ackerman one). The proof is based on three points:

  1. On each leaf-root path, the rank of each node is increasing. This in fact relies on union by rank, BTW, and, with it, it is very easy to show.

  2. If the root of a tree has rank r, the tree has at least 2r nodes. This can be shown by induction.

Based on 2., it's possible to show

  1. The maximal number of nodes with rank r is at most n / 2r.

The rest of the proof now follows by a clever arrangement of the worst possible way the ranks could be organized, which still shows that not too many are bad. For further details, see the Wikipedia entry on this.

Ami Tavory
  • 74,578
  • 11
  • 141
  • 185
  • Thx, but I still don't get it: I use the `sz` array to make it **weighted**, not only path compression, am I wrong? – pjpj Sep 02 '16 at 13:29
  • 3
    Adding to this answer - there's a great paper, "Top-Down Analysis of Path Compression" by Seidel and Sharir, that shows how to derive the log* n bound, then to use that to get a log** n bound, then a log*** n bound, and eventually to converge to the Ackermann inverse bound. It's a beautiful argument! I put together [a set of lecture slides](http://web.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/16/Slides16.pdf) based on the paper that give some nice visuals for it. – templatetypedef Apr 29 '20 at 07:02
  • @templatetypedef I read the paper didn't understand fully what was happening then looked at your slides... and that feeling when you get the idea ... your slides completed the paper :) Thank you soo much for sharing :) – maharshi Feb 27 '21 at 21:51
1

As I recall, the proof had to do with amortizing the cost of the path compression over a set of searches. Shallow searches are cheap, and don't incur the cost of much path compression; deep searches cost a lot for that search and the path compression, but the path compression makes subsequent searches much cheaper on average.

Scott Hunter
  • 48,888
  • 12
  • 60
  • 101