0

There is a problem "Find the element repeated more than n/2 times"

Can you please help to estimate time complexity for solution that uses random:

  1. Pick random element in array
  2. Iterate through array and count number of occurrences of selected element
  3. If count is > N/2 - return that element.
  4. Repeat from step 1.

What would be the worst case for this method, if I use perfect random generator that gives random uniform numbers? O(N²)?

My intuition says that on average it should give answer in two tries, but it is only average case. How to prove it? I'm not quite sure how to estimate running time for random algorithms.

Community
  • 1
  • 1
vmg
  • 9,920
  • 13
  • 61
  • 90

4 Answers4

3

Assuming that there is actually an element that appears more than n / 2 times, the expected running time is O(n). You can think about it this way - each time you choose an element, you need to do O(n) work to check whether it's the majority element. The question then is how many elements, on expectation, you're going to have to pick. Each time you choose an element randomly, you have at least 1/2 probability that you do pick something that's a majority element. On expectation, that means that you'll need to pick two elements before you find a majority element, so the runtime will be O(n). If you're curious why, notice that the probability that you find what you're looking for after exactly k probes (k > 0) is at most 2-k, since you need to have the first k - 1 probes not succeed and then have the kth probe succeed. You can then compute the expected value as

0 * 2-0 + 1 * 2-1 + 2 * 2-2 + ...

= 2

(This summation is known to work out to exactly two, though proving it is a bit messy.)

In the worst-case, though, every time you pick an element, you'll choose something that isn't a majority element. This is extraordinarily unlikely, though - the probability that you don't find a majority element after k rounds is at most 2-k. For k = 300, this number is less than one over the number of atoms in the universe. Therefore, even though you can't bound the worst-case runtime, it's so astronomically unlikely that it's something you can safely ignore.

Hope this helps!

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
2

There is no bound for the worst case running time of this algorithm.

The "perfect" random number generator's output cannot be contingent on the previous history; otherwise it would be imperfect (real-world pseudo-rngs are imperfect in this way, so you might be able to construct a real bound for a specific PRNG).

Consequently, it could take an arbitrary number of guesses before the RNG guesses one of the majority element's positions.

If you're allowed to rearrange the array, you could swap the wrong guess to the beginning of the (still unknown) part of the array, and then restrict the guesses to as-yet-unguessed positions. That would limit the number of iterations to n/2-1, so the worst-case running time for the algorithm would be O(n2).

Although it has no impact on the big-O runtime, you can almost always stop the count scan early, either because you've already found n/2+1 elements or because there are not enough unscanned elements left to bring the count up to that threshold. Even with that optimization, the worst-case (alternating elements) time for a single scan is still n and the expected scan is still O(n).

rici
  • 234,347
  • 28
  • 237
  • 341
1

For randomized algorithms, the expected run time better characterizes their run time. For the algorithm you described the expected run time is at most

S = n * 1/2  + 2n * 1/2^2 + 3n * 1/2^3 + ... up to infinity

We can solve this as follows:

S =      n/2 + 2n/2^2 + 3n/2^3  + ... up to infinity
2S = n + 2n/2 + 3n/2^2 + 4n/2^3 + ... up to infinity

(subtracting the top from bottom)
S = n + n/2 + n/4 + n/8 + ... up to infinity
  = 2n 

So the expected runtime is O(n).

krjampani
  • 2,902
  • 20
  • 23
  • O(n) is correct, but the number of elements examined will be less than n in the last scan (probably) since you can stop when the count reaches (n+1)/2, which will probably happen before you see n elements (and will certainly happen before you see n elements if there are more than (n+1)/2 instances of the most common element). – rici Apr 14 '15 at 02:10
0

If we talk about a worst-case complexity, we mean an worst case for the input, i.e. an input that forces the algorithm into it's worst possible running time.

This is the same for randomized algorithms. We calculate the expected complexity for an worst case input.

In your example an worst input would be an array of length n, that only contains a number a ⌊n/2⌋+1 times.

The complexity is now O(n)⋅E[X], where X is the number of tries you have to randomly pick a number from the array until you pick a. If a is m times in the array E[X] = n/m holds. So for our worst case input we get E[X] = n/(⌊n/2⌋+1) < n/(n/2) = 2.

So this randomized algorithm has a worst case complexity of O(n).

AbcAeffchen
  • 14,400
  • 15
  • 47
  • 66
  • `O(n)⋅E[X]` is *expected* complexity, not worst-case. – chepner Apr 14 '15 at 02:41
  • @chepner It's not that easy... It is true, if the probability is about an random bits of the *input*. Maybe someone makes no difference between the randomness of the algorithm and the randomness of the input, but I think that it makes no sense in this and most other cases. At the end it's about the definition of the complexity of an randomized algorithm one uses. – AbcAeffchen Apr 14 '15 at 02:58
  • Worst-case complexity is very well defined for randomized algorithms, and has nothing to do with the input. That's the very reason we define expected-time complexity for them, so that we can say something more useful about its complexity than "The worst case time complexity is unbounded." – chepner Apr 14 '15 at 03:00
  • @chepner I don't think the comments are the right place to discuss this. I know that there is at least one different definition used by some professors. If you are interested, we can start a chat. – AbcAeffchen Apr 14 '15 at 03:12