9

Recently, I read this and was surprised that the time complexity of the union & find algorithm just with the path compression was O((m+n) log n), where m is the number of 'find' queries and n is the number of 'merge' queries.

I was interested in this complexity, (because I usually implement this algorithm without rank, and even when I applied union by rank upside down, the performance was not bad!) and tried to find an example which can make that time complexity. But because of the great power of path compression, it was really hard...

Is there any examples that can achieve Omega((m+n) log n), or is that complexity just theoretical, not practical?

Community
  • 1
  • 1
Love Paper
  • 471
  • 1
  • 4
  • 15
  • What do you mean by "example", a code snippet? Don't you say that you have already written this algorithm yourself? – Bergi Jul 21 '14 at 12:15
  • @Bergi I meant 'example' "the sequence of 'merge' operation and 'find' operations". For example, "merge(1, 2), merge(1, 3), find(3)". – Love Paper Jul 21 '14 at 13:14
  • @Bergi: Without union-by-rank it's two lines of code. With union-by-rank it's five lines of code. It's not surprising that he's implemented it himself, and that's not what he's asking for. He's asking how to make it perform badly. – tmyklebu Jul 21 '14 at 13:59

2 Answers2

7

Yes, there's a matching lower bound due to Michael J. Fischer in 1972 (see Section 3). His example uses a binomial tree of depth log n, where a binomial tree of depth 0 is a single node and a binomial tree of depth k is two binomial trees of depth k - 1 with the root of one pointing to the root of the other. By repeatedly taking the union of the root of the binomial tree to point to a singleton and doing a find on the deepest node, we perform an expensive (logarithmic steps) operation that leaves another embedded binomial tree to exploit.

Python demonstration: this prints (k+1) * 2**k where k is the argument to example, representing an approximate operation count for O(2**k) operations on O(2**k) keys.

p = {}

def find(a):
    global count
    b = a
    while (b in p):
        count += 1
        b = p[b]
    while (a in p):
        pa = p[a]
        p[a] = b
        a = pa
    return b

def union(a, b):
    p[find(a)] = find(b)

def deepest():
    maxd = (0, None)
    for (a, pa) in p.items():
        d = 1
        while (pa in p):
            d += 1
            pa = p[pa]
        maxd = max(maxd, (d, a))
    return maxd[1]

def example(k):
    global count, p
    count = 0
    p.clear()
    for a in range(((2 ** k) - 1), 0, (- 1)):
        union(a, (a & (a - 1)))
    r = 0
    for a in range((2 ** k), (2 ** (k + 1))):
        union(r, a)
        find(deepest())
        r = a
example(9)
print(count)
David Eisenstat
  • 64,237
  • 7
  • 60
  • 120
2

or is that complexity just theoretical, not practical?

Yes. The complexity of an algorithm is a purely theoretical construct to describe how well the algorithm scales for different input sizes (in this case, the numbers of finds and unions).

This does not give any guarantees for the number of steps required for a particular instance of input (say: 5 finds and 3 unions) - other than being finite, of course. In fact, the big O notation uses the concept of an arbitrarily large multiplicative constant, which does not help to calculate exact runtimes but is enough to distinguish algorithms in complexity classes.

Bergi
  • 630,263
  • 148
  • 957
  • 1,375
  • Note that the big O is meant to describe the WORST POSSIBLE OUTCOME. In many cases, the performance will exceed the worst outcome by the nature of the arrangement of data -- basically that, on average, you're luckier than average. – PaulProgrammer Jul 21 '14 at 13:58
  • 1
    @PaulProgrammer: Not exactly, you can do [best, worst, or some average case](https://en.wikipedia.org/wiki/Best,_worst_and_average_case) complexity analysis. Big O notation is used for average case analysis as well. – Bergi Jul 21 '14 at 14:00