32

There is an array (of size N) with an element repeated more than N/2 number of time and the rest of the element in the array can also be repeated but only one element is repeated more than N/2 times. Find the number.

I could think of few approaches:

  • Naive, keep the count of each number in a hash map.
  • Simplest, sort the array and the number at n/2+1 th index is the required number.
  • Keep count of only consecutive duplicate values found. Check separately for the pattern where the values are stored alternatively.

Unable to think of a better solution, there has to be.

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
SnoopyMe
  • 1,015
  • 1
  • 12
  • 21
  • 2
    From WIKI: The median of a finite list of numbers can be found by arranging all the observations from lowest value to highest value and picking the middle one. Basically sort and give your number :) – sll Aug 14 '11 at 21:37
  • 2
    you cannot go faster than O(N). Sorting is overkill. – Tomas Aug 15 '11 at 13:34
  • http://stackoverflow.com/questions/744981/array-of-size-n-with-one-element-n-2-times – Trying Jul 21 '13 at 02:17

7 Answers7

66

There is a beautiful algorithm for solving this that works in two passes (total time O(N)) using only constant external space (O(1)). I have an implementation of this algorithm, along with comments including a correctness proof, available here

The intuition behind the algorithm is actually quite beautiful. Suppose that you were to have a roomful of people each holding one element of the array. Whenever two people find each other where neither is holding the same array element as the other, the two of them sit down. Eventually, at the very end, if anyone is left standing, there's a chance that they're in the majority, and you can just check that element. As long as one element occurs with frequency at least N/2, you can guarantee that this approach will always find the majority element.

To actually implement the algorithm, you make a linear scan over the array and keep track of your current guess as to what the majority element is, along with the number of times that you've seen it so far. Initially, this guess is undefined and the number of repeats is zero. As you walk across the array, if the current element matches your guess, you increment the counter. If the current element doesn't match your guess, you decrement the counter. If the counter ever hits zero, then you reset it to the next element you encounter. You can think about this implementation as a concrete realization of the above "standing around in a room" algorithm. Whenever two people meet with different elements, they cancel out (dropping the counter). Whenever two people have the same element, then they don't interact with each other.

For a full correctness proof, citation to the original paper (by Boyer and Moore of the more famous Boyer-Moore string matching algorithm), and an implementation in C++, check out the above link.

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
  • 2
    The interesting code section on your blog is very nice, another great explanation +1. – nikhil Jul 17 '12 at 09:39
  • Can someone help me understand what is the resetting the counter when it hits zero. What is the use of this step ? what will happen if we don't do it. – ashish2199 Aug 02 '18 at 02:33
16

This is the Majority element problem. There is a single pass, constant space algorithm for this problem. Here is a brief algorithm coded in python:


    import random

    items = [1, 2, 3, 4, 5, 5, 5, 5, 5 ]
    # shuffle the items
    random.shuffle(items)

    print("shuffled items: ", items)

    majority_elem = items[0]
    count = 1
    for i in range(1,len(items)):
        if items[i] == majority_elem:
            count += 1
        else: 
            count -= 1
            if count == 0:
                majority_elem = items[i]
                count = 1

    print("majority element : %d" % majority_elem )

  

We use a variable majority_elem to keep track of majority element and a counter (count)

  • Initially we set the first element of the array as the majority element.

  • we navigate through the array,

  • if the current element == majority element : increment count

  • else : { decrement count. if count becomes zero, set count = 1 and set majority_element = current element. }

There is a variation to this problem, instead of an array, there could be a very large sequence and we do not know the length before hand. If this case, sorting or partioning is not helpful.

References:

  • The Art of Computer Programming, Fascicle 0: Introduction to Combinatorial Algorithms and Boolean Functions, Volume 0; Volume 4
vine'th
  • 4,890
  • 2
  • 27
  • 27
  • 1
    Very elegant implementation +1 – nikhil Jul 17 '12 at 09:46
  • 1
    No, it's Not a Majority element problem. This algo fails where the array contains 2 or more than 2 numbers, repeating n/k times. – Enkesh Gupta Jul 17 '15 at 14:31
  • @EnkeshGupta : the problem definition specifically says n/2 and 2 is not a valid size as per the problem definition. – vine'th Jul 20 '15 at 07:35
  • 1
    Oh sorry, in didn't have a proper look to it. – Enkesh Gupta Jul 20 '15 at 09:42
  • 1
    What if there is no majority? Returns the last element – Sabri Meviş Dec 11 '15 at 19:41
  • 3
    I'm sorry but your algorithm is not correct. You need to have a second pass to confirm that the element you found is actually the majority by doing a second pass over the array. I agree that the question indicates there is one number in the array, which is repeated more than N/2 times. You should at least add that as a necessary precondition to your answer otherwise the answer is misleading. – Tuxdude Apr 07 '16 at 06:50
7

If an element is repeated more than N/2 times then it must be the median. There are many algorithms that allow you to find this efficiently.

hugomg
  • 68,213
  • 24
  • 160
  • 246
6

No need for sorting. You can simply use a median selection algorithm to determine the n/2-th element. Quickselect runs in O(n) expected time. Median of medians runs in O(n).

Frigo
  • 1,709
  • 1
  • 14
  • 32
6

Are you familiar with quicksort? It has a function called 'partition' that, given a value, divides the array into a section where all values are greater than the value (the pivot) are on one side, while all values less than the value are on the other side. Note that this is not a sort, simply a separation. The N/2 count item will be in the larger of the two sections. You can recursively apply this technique to find the element in O(n) time.

wikipedia: quicksort, or Partition-based general selection algorithm

Jim
  • 641
  • 4
  • 9
  • Can you elaborate on how this works in time O(n)? If the array consists of n+1 copies of one element and n other unique elements, then you could conceivably get unlucky with how you pick your pivots and have the runtime degrade to O(n^2). – templatetypedef Jul 08 '22 at 23:40
6

In your second approach, you are essentially selecting the median element. Take a look at algorithms for finding the median of a list of numbers. In particular, a selection algorithm would work fine for this and compute it in O(n).

Hoare's selection algorithm works very similar to quick sort, except that instead of recursing down both partitions, it only recurses down one partition (the partition that contains the kth element).

In C++, the standard library provides a selection algorithm in the form of std::nth_element, which guarantees O(n) average complexity. You can use this find the median.

int a[8] = {5, 1, 1, 1, 2, 1, 3, 1};
int median = *std::nth_element(a, a + 4, a + 8);

Note that std::nth_element will also partially sort the array in place.

Peter Alexander
  • 53,344
  • 14
  • 119
  • 168
4

Sort the array using any sorting algorithm. The element which was repeated more than half of the time will always be the mid element.The complexity will be nlog(n).

user3107673
  • 423
  • 4
  • 9