10

Given an array of n integers, where one element appears more than n/2 times. We need to find that element in linear time and constant extra space.

YAAQ: Yet another arrays question.

baskin
  • 1,643
  • 1
  • 18
  • 29
  • This is a duplicate, but I can't remember either which question it is or what the answer is... – Jon Skeet Apr 13 '09 at 19:02
  • Found the duplicate: http://stackoverflow.com/questions/278488/puzzle-find-the-most-common-entry-in-an-array (I've got a different answer there. Wrote this answer before finding duplicate...) – Jon Skeet Apr 13 '09 at 19:09
  • http://stackoverflow.com/questions/7059780/find-the-element-repeated-more-than-n-2-times – Aghoree Jul 14 '12 at 06:57
  • `Yet another arrays question` … not tagged [tag:arrays]. (Why?) – greybeard Oct 03 '17 at 19:43

9 Answers9

23

I have a sneaking suspicion it's something along the lines of (in C#)

// We don't need an array
public int FindMostFrequentElement(IEnumerable<int> sequence)
{
    // Initial value is irrelevant if sequence is non-empty,
    // but keeps compiler happy.
    int best = 0; 
    int count = 0;

    foreach (int element in sequence)
    {
        if (count == 0)
        {
            best = element;
            count = 1;
        }
        else
        {
            // Vote current choice up or down
            count += (best == element) ? 1 : -1;
        }
    }
    return best;
}

It sounds unlikely to work, but it does. (Proof as a postscript file, courtesy of Boyer/Moore.)

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • Cool! It works because (by induction) the element you're looking for would be the same as that found given a smaller array where tails consisting of a smaller number of other elements had been removed. – MarkusQ Apr 13 '09 at 19:13
  • This should work because eventually you'll have to get repeated items,i.e it should work with {...random.... ,1,1,1,1,1,1..} and with the worst case of {1,a,1,b,1,c,1,d.. } – Steve B. Apr 13 '09 at 19:14
  • 1
    Hmm, this doesn't even work? Try: int[] wrongwrongwrong = {4, 2, 2, 3, 2, 4, 3, 2, 6, 4}; – Blankman May 19 '10 at 04:55
  • @Blankman: Hmm... odd. I'll look into fixing it. Thanks for pointing it out. – Jon Skeet May 19 '10 at 05:40
  • 1
    @Blankman: Oh, I've worked it out. In your example, no element appears more than n/2 times. My implementation doesn't validate that the input matches the question description. – Jon Skeet May 19 '10 at 05:45
  • Yes :) it gets tricky, b/c then you have to keep some sort of a history of the 'best' counter. – Blankman May 19 '10 at 14:24
  • It works because the element that appears more than n/2 times gets more votes than any other element can ever get. – tilish Aug 08 '11 at 21:35
  • @JonSkeet I have added a tiny edit for a corner case, i.e there is no element which appears more than n/2 times. Return -1 in such case. – jerrymouse Sep 18 '12 at 15:33
  • @jerrymouse: I see no such edit, but yes, this code does assume that it's given input which complies with the question description. – Jon Skeet Sep 18 '12 at 15:47
  • @JonSkeet Someone must have removed the edit. Its a small edit. Replacing `return best` with: `count = 0; size = 0; foreach (int element in sequence) { if (best == element) count ++; size++; } if (count>size/2) return best; else return -1;` – jerrymouse Sep 18 '12 at 16:53
  • @jerrymouse, 10 years later, but.... that isn't correct. Consider `{b,a,b,a,b,a,a}`, at each `b` `count` is set to `0`. At the end the `count` is only `2`. – Elliott Oct 26 '22 at 11:11
8

Find the median, it takes O(n) on an unsorted array. Since more than n/2 elements are equal to the same value, the median is equal to that value as well.

2
int findLeader(int n, int* x){
    int leader = x[0], c = 1, i;
    for(i=1; i<n; i++){
        if(c == 0){
            leader = x[i];
            c = 1;
        } else {
            if(x[i] == leader) c++;
            else c--;
        }
    }

    if(c == 0) return NULL;
    else {
        c = 0;
        for(i=0; i<n; i++){
            if(x[i] == leader) c++;
        }
        if(c > n/2) return leader;
        else return NULL;
    }
}

I'm not the author of this code, but this will work for your problem. The first part looks for a potential leader, the second checks if it appears more than n/2 times in the array.

Pyetras
  • 1,492
  • 16
  • 21
2

This is what I thought initially.

I made an attempt to keep the invariant "one element appears more than n/2 times", while reducing the problem set.

Lets start comparing a[i], a[i+1]. If they're equal we compare a[i+i], a[i+2]. If not, we remove both a[i], a[i+1] from the array. We repeat this until i>=(current size)/2. At this point we'll have 'THE' element occupying the first (current size)/2 positions. This would maintain the invariant.

The only caveat is that we assume that the array is in a linked list [for it to give a O(n) complexity.]

What say folks?

-bhupi

baskin
  • 1,643
  • 1
  • 18
  • 29
1

Well you can do an inplace radix sort as described here[pdf] this takes no extra space and linear time. then you can make a single pass counting consecutive elements and terminating at count > n/2.

MahdeTo
  • 11,034
  • 2
  • 27
  • 28
0

My first thought (not sufficient) would be to:

  • Sort the array in place
  • Return the middle element

But that would be O(n log n), as would any recursive solution.

If you can destructively modify the array (and various other conditions apply) you could do a pass replacing elements with their counts or something. Do you know anything else about the array, and are you allowed to modify it?

Edit Leaving my answer here for posterity, but I think Skeet's got it.

MarkusQ
  • 21,814
  • 3
  • 56
  • 68
0

How about: randomly select a small subset of K elements and look for duplicates (e.g. first 4, first 8, etc). If K == 4 then the probability of not getting at least 2 of the duplicates is 1/8. if K==8 then it goes to under 1%. If you find no duplicates repeat the process until you do. (assuming that the other elements are more randomly distributed, this would perform very poorly with, say, 49% of the array = "A", 51% of the array ="B").

e.g.:

findDuplicateCandidate: 
    select a fixed size subset.
    return the most common element in that subset
    if there is no element with more than 1 occurrence repeat.
    if there is more than 1 element with more than 1 occurrence call findDuplicate and choose the element the 2 calls have in common    

This is a constant order operation (if the data set isn't bad) so then do a linear scan of the array in order(N) to verify.

Steve B.
  • 55,454
  • 12
  • 93
  • 132
-1

in php---pls check if it's correct

function arrLeader( $A ){
$len = count($A);
$B = array();
$val=-1;
$counts = array_count_values(array); //return array with elements as keys and occurrences of each element as values
for($i=0;$i<$len;$i++){
    $val = $A[$i];
    if(in_array($val,$B,true)){//to avoid looping again and again
    }else{
     if($counts[$val]>$len/2){
      return $val;
     }
     array_push($B, $val);//to avoid looping again and again
    }
 }
 return -1;
}
dbKooper
  • 1,035
  • 10
  • 17
-2
int n = A.Length;
            int[] L = new int[n + 1];
            L[0] = -1;
            for (int i = 0; i < n; i++)
            {
                L[i + 1] = A[i];
            }
            int count = 0;
            int pos = (n + 1) / 2;
            int candidate = L[pos];
            for (int i = 1; i <= n; i++)
            {
                if (L[i] == candidate && L[pos++] == candidate)
                    return candidate;
            }
            if (count > pos)
                return candidate;
            return (-1);
Mallik
  • 1
  • Welcome to Stack Overflow! Thank you for this code snippet, which might provide some limited, immediate help. A proper explanation [would greatly improve](//meta.stackexchange.com/q/114762) its long-term value by showing *why* this is a good solution to the problem, and would make it more useful to future readers with other, similar questions. Please [edit] your answer to add some explanation, including the assumptions you've made. – Toby Speight Oct 03 '17 at 13:43