34

Given an array with N elements. We know that one of those elements repeats itself at least N/2 times.

We don't know anything about the other elements . They may repeat or may be unique .

Is there a way to find out the element that repeats at least N/2 times in a single pass or may be O(N)?

No extra space is to be used .

Ciro Santilli OurBigBook.com
  • 347,512
  • 102
  • 1,199
  • 985
Flash
  • 2,901
  • 5
  • 38
  • 55

8 Answers8

56

As the other users have already posted the algorithm, I won't repeat that. However, I provide a simple explanation as to why it works:

Consider the following diagram, which is actually a diagram of unpolarized light:

arrows radiating from the centre

Each arrow from the centre represents a different candidate. Imagine a point somewhere on an arrow representing the counter and candidate. Initially the counter is at zero, so it begins in the centre.
When the current candidate is found, it moves one step in the direction of that arrow. If a different value is found, the counter moves one step towards the centre.
If there is a majority value, more than half of the moves will be towards that arrow, and hence the algorithm will end with the current candidate being the majority value.

Nabb
  • 3,434
  • 3
  • 22
  • 32
39

st0le answered the question, but here's a 5minute implementation:

#include <stdio.h>

#define SIZE 13

int boyerMoore(int arr[]) {
    int current_candidate = arr[0], counter = 0, i;
    for (i = 0; i < SIZE; ++i) {
        if (current_candidate == arr[i]) {
            ++counter;
            printf("candidate: %i, counter: %i\n",current_candidate,counter);
        } else if (counter == 0) {
            current_candidate = arr[i];
            ++counter;
            printf("candidate: %i, counter: %i\n",current_candidate,counter);
        } else {
            --counter;
            printf("candidate: %i, counter: %i\n",current_candidate,counter);
        }
    }
    return current_candidate;
}

int main() {
    int numbers[SIZE] = {5,5,5,3,3,1,1,3,3,3,1,3,3};
    printf("majority: %i\n", boyerMoore(numbers));
    return 0;
}

And here's a fun explanation (more fun than reading the paper, at least): http://userweb.cs.utexas.edu/~moore/best-ideas/mjrty/index.html

David Titarenco
  • 32,662
  • 13
  • 66
  • 111
  • Thanks! Quite beautiful idea. (BTW, you would surely get some more upvotes if explained concept to everybody in plain English. It's not difficult at all.) – Nikita Rybak Sep 18 '10 at 04:59
  • 7
    This algorithm satisfies the conditions of the question. However, one should keep in mind that it returns the potential majority item. If there is no majority, then the result is meaningless. Thus, if you are unsure, you have to loop a second time and see how many times that element actually appears. – Matthew Sep 18 '10 at 05:03
  • 3
    The question postulates that `we know that one of those elements repeats itself *at least* N/2 times` so if the data is well-formed, the algorithm will work every time. – David Titarenco Sep 18 '10 at 05:04
  • 1
    +1 for a link to an excellent explanation of a brilliant algorithm I've never seen before. – Mark Rushakoff Sep 18 '10 at 05:45
  • Actually your algorithm will output the incorrect value for inputs such as {1, 1, 2, 3}. This is trivial to fix though. – Nabb Sep 18 '10 at 06:21
  • @Nabb - Moore addresses the caveat: `Note that if you replaced the first C with an A, above, the algorithm would still end with C being chosen, but in fact C would not be the majority element: there is no majority in the modified list. In some situations you know, or assume, there is a majority element. But if you must confirm that the chosen element is indeed the majority element, you must take one more linear pass through the data to do it. ` – David Titarenco Sep 18 '10 at 06:36
  • @David - Sorry, misinterpreted the question (specifically the "repeats itself" part). – Nabb Sep 18 '10 at 07:11
  • 1
    @David, it's possibly incorrect, when assigining a new candidate, your count should be assigned `=1` – st0le Sep 19 '10 at 06:02
  • I can see this is a pretty old post, but want to point out that this solution given by @David is not proper whereas the implementation given by st0le seems to be perfect !! One example where above implementation will not work is If an element repeats exactly N/2 times like this a[] = {2, 1, 3, 1, 4, 1, 5, 1}; – SK. Oct 08 '12 at 06:02
  • @S.K - re-read my post above where I quote Moore. He specifically addresses the issue you bring up. – David Titarenco Oct 08 '12 at 22:44
  • @DavidTitarenco it doesn't. Note that the OP wants an element that appears *at least* n/2 times, as opposed to *more than* n/2 times. -1 until the code is corrected (just check both elements that are eliminated in the last vote, if the final candidate count is 0). – ffao Oct 10 '12 at 14:28
  • @ffao - I don't think you understand how Boyer-Moore works (which is what I implemented). The definition of a *majority* element is that it's the element that occurs > than N/2 times. If the element occurs exactly N/2 times, the problem cannot be solved in O(N) which is what the question postulated (please read what Moore said). Again, this has been covered already. Please don't downvote blindly. – David Titarenco Oct 10 '12 at 14:46
  • But you have to answer the question, and the question did not ask for an implementation of Boyer-Moore or to find a majority element. "If the element occurs exactly N/2 times, the problem cannot be solved in O(N)". Please do not make blind assertions (this is trivially possible, and I already said how). – ffao Oct 10 '12 at 19:21
  • The link is broken :S – Mohamed Taher Alrefaie Feb 27 '15 at 16:46
36

The Boyer-Moore Majority Vote Algorithm

//list needs to have an element with a count of more than n/2 throughout itself for
//this algorithm to work properly at all times.

lst = [1,2,1,2,3,1,3,3,1,2,1,1,1]

currentCount = 0
currentValue = lst[0]
for val in lst:
   if val == currentValue:
      currentCount += 1
   else:
      currentCount -= 1

   if currentCount == 0:
      currentValue = val
      currentCount = 1


print(currentValue)
st0le
  • 33,375
  • 8
  • 89
  • 89
  • 1
    It's pretty simple, you could easily implement it. – st0le Sep 18 '10 at 04:28
  • _pretty simple_ Care to explain to the rest of us? – Nikita Rybak Sep 18 '10 at 04:39
  • Added an answer with a simple implementation. – David Titarenco Sep 18 '10 at 04:52
  • @st0le: Your sample is not correct. You use the index i instead of the list value lst[i] in comparision as well as in assignment. Could you fix it? – harper Sep 18 '10 at 07:46
  • @harper, `i` contains the value itself...i guess, `i` convention threw you off. i'll rename it. – st0le Sep 18 '10 at 08:39
  • @st0le: Apologize please. Your right, the 'for' notation reminded to C++ syntax, that screw me. – harper Sep 18 '10 at 09:24
  • Has the same issue as the code above, fails for lst = [1,2,1,3,1,4,1,5]. Plus using // to comment Python code is not exactly nice. – ffao Oct 10 '12 at 14:33
  • @ffao - you don't seem to understand the caveats of Boyer-Moore or the O(N) requirement, st0le gives a very good answer. – David Titarenco Oct 10 '12 at 14:48
  • @ffao: number of 1 is 4 it must be greater than the n/2; here it is equal to n/2. It is not guaranteed when the number alternates, have same elements of that array in a different fashion like this: [1,2,3,1,5,4,1,1] and this will work – brain storm Feb 22 '14 at 00:17
  • @st0le: why do you start your for loop from index[0], instead you must start from index 1: that is `for val in lst[1:]:`. otherwise, you add extra one to the count of first element – brain storm Feb 22 '14 at 00:21
  • @user1988876, Please check again, I've initialized my count to 0 first, It's perfectly fine. – st0le Feb 22 '14 at 00:25
2

This code is a similar implementation to the way in which we find the majority of an element

int find(int* arr, int size)
{ 
int count = 0, i, m;
  for (i = 0; i < size; i++) 
  {
    if (count == 0)
        m = arr[i];
    if (arr[i] == m) 
        count++;
    else
        count--;
   }
    return m;
}
coder101
  • 840
  • 2
  • 11
  • 29
0

Using the modification suggested by ffao to Davi'd reply:

public class MaxRepeated {

    public static void main(final String[] args) {
        maxRepeatedElement(new int[]{1, 2, 1, 2, 3, 2, 3, 1});
        maxRepeatedElement(new int[]{1, 2, 1, 2, 3, 1, 3, 1});
        maxRepeatedElement(new int[]{1, 2, 1, 2, 4, 1, 1, 3, 1, 3, 1});
        maxRepeatedElement(new int[]{1, 2, 1, 2, 2, 1, 2, 3, 1, 2, 1, 2});
    }

    private static int maxRepeatedElement(final int[] arr) {

        int current_candidate = arr[0];
        int previous_candidate = arr[0];
        int counter = 0, i;
        for (i = 0; i < arr.length; ++i) {
            if (current_candidate == arr[i]) {
                ++counter;
            } else if (counter == 0) {
                previous_candidate = current_candidate;
                current_candidate = arr[i];
                ++counter;
            } else {
                --counter;
            }
            System.out.printf("  candidate: %d, counter: %d\n", current_candidate, counter);
        }

        if (counter == 0) {
            System.out.printf(" possible: %d or %d with net freq %d \n", current_candidate, previous_candidate, counter);
            final int f1 = frequency(arr, current_candidate);
            final int f2 = frequency(arr, previous_candidate);
            final int halfLen = arr.length / 2 + (arr.length % 2 == 0 ? 0 : 1);
            if (f1 >= halfLen || f2 >= halfLen) {
                if (f1 > f2) {
                    System.out.printf("majority: %d with freq %d \n", current_candidate, f1);
                } else {
                    System.out.printf("majority: %d with freq %d \n", previous_candidate, f2);
                }
            } else {
                System.out.printf("NO majority! \n");
            }
        } else {
            System.out.printf("majority: %d with freq %d \n", current_candidate, frequency(arr, current_candidate));
        }
        return current_candidate;
    }

    private static int frequency(final int[] arr, final int candidate) {
        int counter = 0;
        for (int c : arr) {
            counter += candidate == c ? 1 : 0;
        }
        return counter;
    }
}
shams
  • 3,460
  • 24
  • 24
0

Try this :

#include<iostream>
using namespace std;
int main()
{
    int counter=0;
    int a[]={10, 11, 5, 27, 4, 2, 7, 5, 7, 11, 9, 5, 5, 4, 10, 7, 5, 3, 7, 5};
    for(int i = 0; i < 20; i++)
    {
        if(a[i]==5)
        counter++;
    }
    cout << "it appears " << counter << " times";
}
Andrii Abramov
  • 10,019
  • 9
  • 74
  • 96
  • 1
    How does this answer the problem. This only shows you the number of fives. It does not find which number has the most repeats. – NathanOliver Jan 22 '16 at 17:49
0

It doesn't seem possible to count anything without using extra space. You have to store atleast one counter somewhere. If you mean to say you cannot use more than O(n) space then it should be fairly easy.

One way would be to create a second list of only unique objects from the original list. Then, create a third list the same length as the second with a counter for the number of occurrences of each item in the list.

Another way would be to sort the list then find the largest contiguous section.

jamesbtate
  • 1,359
  • 3
  • 19
  • 25
  • +1 - probably not the optimal solution, but O(n log n) for the sort-based solution is a good trade-off versus complexity of other methods. – Will A Sep 18 '10 at 04:34
  • You aren't trying to count. You are trying to find a number that occurs at least half the time. – MSN Sep 18 '10 at 05:12
-2

The Boyer-Moore Majority Vote Algorithm fails to find correct majority in the below input arrays

int numbers[SIZE] = {1,2,3,4,1,2,3,4,1,2,3,4};

int numbers[SIZE] = {1,2,3,4,1,2,3,4,1,2,3,4,1};

siva
  • 1,693
  • 2
  • 21
  • 29