17

From https://en.wikipedia.org/wiki/Quickselect it says

"However, instead of recursing into both sides, as in quicksort, quickselect only recurses into one side – the side with the element it is searching for. This reduces the average complexity from O(n log n) to O(n), with a worst case of O(n^2)."

I dont understand how reducing to only looking at one side reduces average complexity to O(n)? Wouldnt it be more of O(N/2 log N) which is still O(N log N). And how is the worst case O(n^2)

ealeon
  • 12,074
  • 24
  • 92
  • 173

4 Answers4

37

n log(n) implies that the algorithm looks at all N items log(n) times. But that's not what's happening with Quickselect.

Let's say you're using Quickselect to pick the top 8 items in a list of 128. And by some miracle of random selection, the pivots you pick are always at the halfway point.

On the first iteration, the algorithm looks at all 128 items and partitions into two groups of 64 items each. The next iteration splits into two groups of 32 items each. Then 16, and then 8. The number of items examined is:

N + N/2 + N/4 + N/8 + N/16

The sum of that series will never reach 2*N.

The worst case is that partitioning always results in very skewed partition sizes. Consider what would happen if the first partitioning only removed one item. And the second only removed one, etc. The result would be:

N + (N-1) + (N-2) ...

Which is (n^2 + n)/2), or O(n^2).

Jim Mischel
  • 131,090
  • 20
  • 188
  • 351
  • 1
    Actually with random pivots you will almost always have uneven splits, and are more likely than not on the larger side. The result is that it takes around `3.4n` comparisons to find the median. Still it is correct that you are very likely to be bounded above by a geometric series that adds up to `O(n)`. – btilly Jul 08 '19 at 20:27
  • 1
    @btilly Whereas I understand that you will almost always have uneven splits, why do you say that they're more likely to be on the larger side? – Jim Mischel Jul 09 '19 at 02:38
  • Suppose that your split was 2/3-1/3. 2/3 of the answers you might be looking for are in the 2/3 bucket, and 1/3 are in the 1/3 bucket, so you're more likely than not to be looking for something in the larger bucket. – btilly Jul 09 '19 at 15:23
  • The size of the effect depends what you are looking for. If you're looking for something near an end, the odds of randomly being in the larger are nearly even. If you're looking for the median, the only way to not be in the larger group is for the median to be the randomly selected pivot (odds `O(1/n)`). This is why the median is a hard case for quickselect. – btilly Jul 09 '19 at 15:31
  • @btilly Do you know/have a proof/link for how you got that exact decimal `3.4n`? – theprogrammer Mar 12 '21 at 01:11
  • @theprogrammer See https://en.wikipedia.org/wiki/Quickselect#Variants – btilly Mar 12 '21 at 03:57
20

I also felt very conflicted at first when I read that the average time complexity is O(n) while we break the list in half each time (like binary search or quicksort). To prove that only looking at one side reduces the average runtime complexity from O(n log n) to O(n), let's compare the time complexity recurrence relations of quicksort (2 sided) and quickselect (1 sided).

Quicksort:

T(n) = n + 2T(n/2)
     = n + 2(n/2 + 2T(n/4))
     = n + 2(n/2) + 4T(n/4)
     = n + 2(n/2) + 4(n/4) + ... + n(n/n)
     = 2^0(n/2^0) + 2^1(n/2^1) + ... + 2^log2(n)(n/2^log2(n))
     = n (log2(n) + 1)      (since we are adding n to itself log2 + 1 times)
 

Quickselect:

 T(n) = n + T(n/2)
      = n + n/2 + T(n/4)
      = n + n/2 + n/4 + ... n/n
      = n(1 + 1/2 + 1/4 + ... + 1/2^log2(n))
      = n (1/(1-(1/2))) = 2n                           (by geometric series)

I hope this convinces you why looking at one side makes the difference!

Aron Vischjager
  • 221
  • 2
  • 4
18

A picture worths one hundred lines:

The sum of this kind of sequence will get infinitely close to 1 but not equals to 1.

JasonLi
  • 249
  • 3
  • 9
0

Its complexity is almost Θ(N)(Everage O(N)).

Let's say that the target index is 1, which means we want to find the minimal element:

  • 1st loop: Examine the whole [1, N] and partition, nearly N operations.
  • 2nd loop: Examine the whole [1, x] and partition, nearly N/2 operations.
  • 3rd loop: Examine the whole [1, y] and partition, nearly N/2 operations.
  • ...

final loop: Examine the whole [1, 1], the arr[1] is our target , 1 operation.

Therefore, the complexity is about:

T = T(N + N/2 + N/4 + ... + 1) 
     = T(1*(1-2^(logN))*(1-2)) 
     = T(2^(logN) - 1) 
     = Θ(N)

This expression might be too simple but hope it can help you. By the way, this is the average complexity of quickselect, because the performance of quicksort/quickselect might fluctuate because of the list values distribution and the target index. You can see https://en.wikipedia.org/wiki/Quickselect for more details.

aeof
  • 77
  • 1
  • 3