42

As I was reading this (Find the most common entry in an array), the Boyer and Moore's Linear Time Voting Algorithm was suggested.

If you follow the link to the site, there is a step by step explanation of how the algorithm works. For the given sequence, AAACCBBCCCBCC it presents the right solution.

When we move the pointer forward over an element e:

  • If the counter is 0, we set the current candidate to e and we set the counter to 1.
  • If the counter is not 0, we increment or decrement the counter according to whether e is the current candidate.

When we are done, the current candidate is the majority element, if there is a majority.

If I use this algorithm on a piece of paper with AAACCBB as input, the suggested candidate would become B what is obviously wrong.

As I see it, there are two possibilities

  1. The authors have never tried their algorithm on anything else than AAACCBBCCCBCC, are completely incompetent and should be fired on the spot (doubtfull).
  2. I am clearly missing something, must get banned from Stackoverflow and never be allowed again to touch anything involving logic.

Note: Here is a a C++ implementation of the algorithm from Niek Sanders. I believe he correctly implemented the idea and as such it has the same problem (or doesn't it?).

Community
  • 1
  • 1
Lieven Keersmaekers
  • 57,207
  • 13
  • 112
  • 146

5 Answers5

45

The algorithm only works when the set has a majority -- more than half of the elements being the same. AAACCBB in your example has no such majority. The most frequent letter occurs 3 times, the string length is 7.

rmmh
  • 6,997
  • 26
  • 37
Rafał Dowgird
  • 43,216
  • 11
  • 77
  • 90
  • 4
    Happens to everyone. Do not be too strict in carrying out point 2. from your answer :) – Rafał Dowgird Apr 23 '09 at 09:32
  • ok, Now I see it. The algorithm states that "This algorithm decides which element of a sequence is in the majority". Overlooked the majority part at first look, and assumed its talking about the element appearing maximum number of times. The majority here means that the element should appear at least half the "number of elements" times ! – texens Sep 03 '13 at 21:00
  • 7
    You mean "more than half", instead of "at least half". – Hiroki Osame Jan 27 '14 at 17:26
28

Small but an important addition to the other explanations. Moore's Voting algorithm has 2 parts -

  1. first part of running Moore's Voting algorithm only gives you a candidate for the majority element. Notice the word "candidate" here.

  2. In the second part, we need to iterate over the array once again to determine if this candidate occurs maximum number of times (i.e. greater than size/2 times).

First iteration is to find the candidate & second iteration is to check if this element occurs majority of times in the given array.

So time complexity is: O(n) + O(n) ≈ O(n)

TheCrazyProgrammer
  • 7,918
  • 8
  • 25
  • 41
Srikar Appalaraju
  • 71,928
  • 54
  • 216
  • 264
  • 4
    +1 I was about to write about the completely overlooked fact that a second iteration is necessary to verify the candidate. Hopefully the OP will notice this late answer. – imreal Jul 23 '13 at 19:34
  • The explanation from author himself: http://www.cs.utexas.edu/users/moore/best-ideas/mjrty/index.html – Anupam Saini Dec 29 '14 at 10:11
  • 2
    Step 1 doesn't give you Most Frequent candidate. AAABBCC will give you C as final candidate, but nor is C the most frequent or the majority. Then you run 2nd pass to see this array has no majority. – lineil Feb 01 '16 at 05:04
7

From the first linked SO question:

with the property that more than half of the entries in the array are equal to N

From the Boyer and Moore page:

which element of a sequence is in the majority, provided there is such an element

Both of these algorithms explicitly assume that one element occurs at least N/2 times. (Note in particular that "majority" is not the same as "most common.")

j_random_hacker
  • 50,331
  • 10
  • 105
  • 169
0

I wrote a C++ code for this algorithm

char find_more_than_half_shown_number(char* arr, int len){
int i=0;
std::vector<int> vec;
while(i<len){
    if(vec.empty()){     
        vec.push_back(arr[i]);
        vec.push_back(1);
    }else if(vec[0]==arr[i]){ 
        vec[1]++;
    }else if(vec[0]!=arr[i]&&vec[1]!=0){
        vec[1]--;
    }else{                   
        vec[0]=arr[i];
    }
    i++;
}
int tmp_count=0;
for(int i=0;i<len;i++){
    if(arr[i]==vec[0])
        tmp_count++;
}
if(tmp_count>=(len+1)/2)
    return vec[0];
else
    return -1;
}

and the main function is as below:

int main(int argc, const char * argv[])
{
    char arr[]={'A','A','A','C','C','B','B','C','C','C','B','C','C'};
    int len=sizeof(arr)/sizeof(char);
    char rest_num=find_more_than_half_shown_number(arr,len);
    std::cout << "rest_num="<<rest_num<<std::endl;
    return 0;
}
feliciafay
  • 43
  • 8
0

When the test case is "AAACCBB", the set has no majority. Because no element occurs more than 3 times since the length of "AAACCBB" is 7.

Here's the code for "the Boyer and Moore's Linear Time Voting Algorithm":

int Voting(vector<int> &num) {
        int count = 0;
        int candidate;

        for(int i = 0; i < num.size(); ++i) {
            if(count == 0) {
                candidate = num[i];
                count = 1;
            }
            else
                count = (candidate == num[i]) ? ++count : --count;
        }
        return candidate;
    }
Peter Rui
  • 1
  • 1