Linear time majority algorithm?

Question

Can anyone think of a linear time algorithm for determining a majority element in a list of elements? The algorithm should use O(1) space.

If n is the size of the list, a majority element is an element that occurs at least ceil(n / 2) times.

[Input] 1, 2, 1, 1, 3, 2

[Output] 1

[Editor Note] This question has a technical mistake. I preferred to leave it so as not to spoil the wording of the accepted answer, which corrects the mistake and discusses why. Please check the accepted answer.

score 14 · Accepted Answer · answered Nov 26 '10 at 01:53

I would guess that the Boyer-Moore algorithm (as linked to by nunes and described by cldy in other answers) is the intended answer to the question; but the definition of "majority element" in the question is too weak to guarantee that the algorithm will work.

If n is the size of the list. A majority element is an element that occurs at least ceil(n/2) times.

The Boyer-Moore algorithm finds an element with a strict majority, if such an element exists. (If you don't know in advance that you do have such an element, you have to make a second pass through the list to check the result.)

For a strict majority, you need "... strictly more than floor(n/2) times", not "... at least ceil(n/2) times".

In your example, "1" occurs 3 times, and other values occur 3 times:

Example input: 1, 2, 1, 1, 3, 2

Output: 1

but you need 4 elements with the same value for a strict majority.

It does happen to work in this particular case:

Input: 1, 2, 1, 1, 3, 2
Read 1: count == 0, so set candidate to 1, and set count to 1
Read 2: count != 0, element != candidate (1), so decrement count to 0
Read 1: count == 0, so set candidate to 1, and set count to 1
Read 1: count != 0, element == candidate (1), so increment count to 2
Read 3: count != 0, element != candidate (1), so decrement count to 1
Read 2: count != 0, element != candidate (1), so decrement count to 0
Result is current candidate: 1

but look what happens if the final "1" and the "2" at the end are swapped over:

Input: 1, 2, 1, 2, 3, 1
Read 1: count == 0, so set candidate to 1, and set count to 1
Read 2: count != 0, element != candidate (1), so decrement count to 0
Read 1: count == 0, so set candidate to 1, and set count to 1
Read 2: count != 0, element != candidate (1), so decrement count to 0
Read 3: count == 0, so set candidate to 3, and set count to 1
Read 1: count != 0, element != candidate (3), so decrement count to 0
Result is current candidate: 3

Thanks for pointing that out, Matthew. I had missed that point after going through the description. — Anil Katti, Nov 26 '10 at 02:13
You're making too big a deal of the corner case. You just modify the algorithm so that as soon as count becomes 0, you update the candidate to whatever you just saw, and it works. — ShreevatsaR, Nov 26 '10 at 05:35
@ShreevatsaR: No - that would fix the second case but break the first one. — Matthew Slattery, Nov 26 '10 at 19:09
using a variation of Boyer Moore, I think it is possible to solve the problem as stated, though in several passes (3 or 4, which doesn't change the overall complexity), would you mind reviewing my answer ? — Matthieu M., Dec 02 '10 at 13:13
Could you please check [my question](http://stackoverflow.com/questions/7767222/majority-voting-algorithm-wrong) about this topic and the accepted answer? I believe the proposed algorithm in the accepted answer is correct. — OmarOthman, Oct 27 '11 at 10:31
I think in addition to checking the candidate element as obtained from the majority algorithm, if we also check the last element of array, then the problem will be solved. — ksb, Aug 27 '14 at 12:03
In case of an even number of input elements skip the last element and find the 'strict' majority element. If available it is the 'weak' majority element otherwise there is none. — Wouter, Sep 26 '17 at 22:20

score 9 · Answer 2 · edited Mar 27 '16 at 04:56

9

Boyer-moore algorithm: http://www.cs.utexas.edu/~moore/best-ideas/mjrty/index.html

You scan a list (or stream) and maintain one counter. Initially counter = 0, majority_element = null. As you scan, if the counter is 0, you assume current element as majority element and increment counter. If counter != 0, you increment or decrement the counter according to whether current element is the current majority element.

This algorithm doesn't give you the majority if there isn't one. If you want level of correctness, you would have to make one more pass to validate it is in fact the majority (i.e., >= 50%).

edited Mar 27 '16 at 04:56

Benjamin W.

46,058
19
106
116

answered Nov 25 '10 at 20:30

2

It would be a great answer if you described this a bit. – Matthieu M. Nov 27 '10 at 17:57
1

its straightforward. You scan a list (or stream) and maintain one counter. Initially counter = 0, majority_element = null. As you scan, if the counter is 0, you assume current element as majority element and increment counter. If counter != 0, you increment or decrement the counter according to whether current element is the current majority element. This algorithm doesn't give you the majority if there isn't one. If you want level of correctness, you would have to make one more pass to validate it is in fact the majority (ie >= 50%) – Nov 29 '10 at 17:07

score 7 · Answer 3 · answered Nov 30 '10 at 05:48

7

This is a popular challenge question, and the answer is that it's impossible. The language of strings with majority elements is not regular (this is easily proven by the pumping lemma) so there is no way it can be recognized in constant space.

Of course the trick is that you need a counter variable which takes O(log n) space, but since n is boundd by 2^32 or 2^64 and your computer is really a finite state machine with ~8^(ramsize+hddsize) states, everything is O(1).

answered Nov 30 '10 at 05:48

R.. GitHub STOP HELPING ICE

208,859
35
376
711

I have the impression that you are trying to solve a much more complex problem, and I think I have solved the OP's problem as well... would you mind reviewing my answer ? – Matthieu M. Dec 02 '10 at 13:14
Theorem: a language is regular if and only if it can be recognized in `O(1)` space. I'm going to do some hand-waving regarding the equivalence of "recognizing arrays with a majority element" and "finding the majority element if it exists". Now, the pumping lemma is a standard tool for proving that a language is not regular, and it's very easy to use. With it you can prove that the language consisting of arrays that have a majority element is not regular. (As an exercise, try to find a regular expression that matches such strings with majority where the members can be ASCII letters. You can't.) – R.. GitHub STOP HELPING ICE Dec 02 '10 at 13:49
[@R..] I'm quite interested about Theory of Computation and instructed the course as a Teaching Assistant twice. I have a question: I believe the predictive parsing algorithm for converting any CFG to a working parser is linear in time and doesn't use extra space. How comes it's *if and only if*? Because this means that any language that's recognized recognized in O(1) space is regular! Think also of (a^n)(b^n)(c^n) language (which is neither regular nor context-free) yet has a O(1) space solution! Please tell me what I'm missing... – OmarOthman Oct 27 '11 at 10:39
1

It does not have an O(1) space solution. Counting to `n` takes O(log n) or at least O(log log n) space. – R.. GitHub STOP HELPING ICE Oct 27 '11 at 12:15

score 7 · Answer 4 · answered Dec 02 '10 at 13:10

7

I think it is possible, using Boyer-Moore, though not directly.

As Matthew stated, Boyer-Moore only guarantees to find the majority element for a slightly different definition of majority, called strict majority. Your definition is slightly weaker, but not by much.

Execute Boyer-Moore: O(N) time, O(1) space
Check that the candidate fulfills the condition: O(N) time, O(1) space
If it doesn't, execute Boyer-Moore, but ignores the instances of the "failed" candidate: O(N) time, O(1) space
Check that the (new) candidate fulfills the condition: O(N) time, O(1) space

The 1. and 2. steps are straight-forward. The 3. works because by removing instances of the failed candidates, we are now looking for a strict majority element. The 4. is optional, and only to be used if there is a possibility that no majority element exists.

answered Dec 02 '10 at 13:10

Matthieu M.

287,565
48
449
722

1

This is the standard "answer" and solves what the person asking the problem meant to ask, but the `O(1)`'s are technically wrong. The storage is one or more integer variables whose values can reach `n`, so it takes `log n` space. For practical purposes however `log n` is usually bounded by a constant like 32 or 64. :-) – R.. GitHub STOP HELPING ICE Dec 02 '10 at 13:43
Ah, I see what you mean. Indeed the counter size depends on N :) – Matthieu M. Dec 02 '10 at 13:51
yes, I think this does work for the definition of "majority" in the question. Nice; +1. – Matthew Slattery Dec 03 '10 at 00:12

score 2 · Answer 5 · answered Nov 25 '10 at 20:42

If you know that the majority element is in more of half of the array size then there is such algorithm. You keep track of the most common element and the repetitions of it. When you start that element is the first and there is one repetition. If the next element is different from the current most common then you substract one from the repetitions. If the repetitions become zero then you change the most common with the element you are currently observing and set the repetitions to 1.

Thorarin · Answer 6 · 2010-12-02T12:44:58.407

0

I could be wrong, but the combination of a O(n) execution time and constant memory usage seems impossible to me. Not using extra space would require sorting. The fastest comparison sort is O(n log n).

Using Radix sort, you can get a better worst case execution time, but more memory usage.

edited Dec 02 '10 at 12:44

answered Nov 25 '10 at 20:07

Thorarin

47,289
11
75
111

Why "would require sorting"? There *is* such an algorithm (note the question: it just wants some element that occurs at least n/2 times, if such an element exists (which is rather special!)). – ShreevatsaR Nov 25 '10 at 20:11
@ShreevatsaR: Can you tell me what the algorithm is called? Because I doubt it's O(n) – Thorarin Nov 25 '10 at 21:30
@Thorarin: It's a very nice algorithm due to Boyer-Moore, see numes's answer. Basically, you just keep the count of *one* character, decrementing it whenever you see another character and updating the character when the count goes to 0. (BTW, I wasn't the one who downvoted this answer.) – ShreevatsaR Nov 25 '10 at 21:46
@Thorarin: it's not because you cannot think about an algorithm that it doesn't exist. Proving lower bounds is hard :) – Matthieu M. Nov 27 '10 at 17:58
Your answer is correct; whoever downvoted is an idiot. It's easy to prove with the pumping lemma that there is no `O(1)` space solution. – R.. GitHub STOP HELPING ICE Nov 30 '10 at 05:49
@Matthieu: I'm not familiar with the mentioned pumping lemma, but it's pretty easy to see that a general solution in O(n) time and O(1) space is impossible. The fastest comparison sort is O(n log n), which is what I was referring to. However, I should have mentioned the possibility of using Radix sort for better than n log n performance. – Thorarin Dec 02 '10 at 12:27
@Thorarin: for "strict" majority, it is indeed possible (strict meaning that the element appear more than n/2 times), so the "easy to see" or "require sorting" should have been backed up by arguments. R.. for example cites the pumping lemma, though I still fail to see how it applies here. – Matthieu M. Dec 02 '10 at 13:00

score 0 · Answer 7 · answered May 22 '11 at 08:22

Use pre-stages of heap sort:

Build a heap for the array elements which runs in linear time -> O(n).
Then take (N/2)th element & search to upper parent nodes of it if all are equal or not -> O(n/2)

if all are equal then (N/2)th element is ans.

so overall O(n) + O(n/2) -> O(n)

Linear time majority algorithm?

7 Answers7

Linked

Related