14

I have read this problem Find the most common entry in an array

and the answer from jon skeet is just mind blowing .. :)

Now I am trying to solve this problem find an element which occurs more than n/3 times in an array ..

I am pretty sure that we cannot apply the same method because there can be 2 such elements which will occur more than n/3 times and that gives false alarm of the count ..so is there any way we can tweak around jon skeet's answer to work for this ..?

Or is there any solution that will run in linear time ?

Community
  • 1
  • 1
Sree Ram
  • 819
  • 3
  • 12
  • 22
  • 4
    [Related](http://stackoverflow.com/questions/14761106/determine-if-more-than-half-of-the-array-repeats-in-a-distinct-array). Read Jan's answer - "it should be possible to adapt to a threshold of 33%". – Bernhard Barker Feb 19 '13 at 10:57

6 Answers6

27

Jan Dvorak's answer is probably best:

  • Start with two empty candidate slots and two counters set to 0.
  • for each item:
    • if it is equal to either candidate, increment the corresponding count
    • else if there is an empty slot (i.e. a slot with count 0), put it in that slot and set the count to 1
    • else reduce both counters by 1

At the end, make a second pass over the array to check whether the candidates really do have the required count. This isn't allowed by the question you link to but I don't see how to avoid it for this modified version. If there is a value that occurs more than n/3 times then it will be in a slot, but you don't know which one it is.

If this modified version of the question guaranteed that there were two values with more than n/3 elements (in general, k-1 values with more than n/k) then we wouldn't need the second pass. But when the original question has k=2 and 1 guaranteed majority there's no way to know whether we "should" generalize it as guaranteeing 1 such element or guaranteeing k-1. The stronger the guarantee, the easier the problem.

Community
  • 1
  • 1
Steve Jessop
  • 273,490
  • 39
  • 460
  • 699
  • yep ..I think we cant avoid ...much like boyer moore since we have two majority keep two possible counts ... – Sree Ram Feb 19 '13 at 11:18
  • Second pass maybe removed via storing intersection of all removed 3 numbers (not more than 3 numbers) – Толя Feb 19 '13 at 11:51
  • http://www.geeksforgeeks.org/given-an-array-of-of-size-n-finds-all-the-elements-that-appear-more-than-nk-times/ – dsfdf Oct 28 '13 at 00:20
  • I'm not sure if I understand this correctly but could you point out why do we use 2 slots for counting? – Dubby Jun 26 '14 at 06:56
  • As an example we have an array{2,1,8,2,4,2}. 2 occurs more than n/3 = 2 times. At the end of your algo the candidate slots will have 4 and 2 in them each with a count 1, is that right? So we need to just check for both 4 and 2 if they occur more than 2 times? – Dubby Jun 26 '14 at 07:05
  • @Dubby: actually that example ends with one slot containing 2 with count 2, and one slot containing 4 with count 1. I'm starting to think maybe it's enough to have a guarantee that at least one element occurs with frequency greater than n/3 *and* for the two slots to have non-equal counts at the end (i.e, you don't have to do the second pass if the counts are different). But I don't have time at the moment to prove or disprove that. – Steve Jessop Jun 26 '14 at 08:35
  • 3
    Very elegant! Does anyone have a proof that this works? – Ron Jun 21 '17 at 18:02
  • 1
    @Ron In the worst case scenario the element which occurs more than n/3 times will occur in the beginning so that its freq gets decreased max times. Lets call this no. as MAXF. Now frequency of MAXF will be atleast `floor(n/3)+1` so remaining nos. are `n-floor(n/3)-1 = 2*n/3-1 + {n/3}` where `{x}` is fraction part of `x`. So there are less than `2n/3` no. . Among them more of them contribute to increasing one counter, and less of them contribute to decreasing both the counters because the first of these nos. increases one of the counters. So we can say the MAXF counter will always be nonzero – jonsno Jun 23 '20 at 07:58
5

Using Boyer-Moore Majority Vote Algorithm, we get:

vector<int> majorityElement(vector<int>& nums) {
    int cnt1=0, cnt2=0;
    int a,b;
    for(int n: A){
      if (n == a) cnt1++;
      else if (n == b) cnt2++;
      else if (cnt1 == 0){
        cnt1++;
        a = n;
      }
      else if (cnt2 == 0){
        cnt2++;
        b = n;
      }
      else{
        cnt1--;
        cnt2--;
      }
    }
    cnt1=cnt2=0;
    for(int n: nums){
        if (n==a) cnt1++;
        else if (n==b) cnt2++;
    }
    vector<int> result;
    if (cnt1 > nums.size()/3) result.push_back(a);
    if (cnt2 > nums.size()/3) result.push_back(b);
    return result;
}

Updated, correction from @Vipul Jain

didxga
  • 5,935
  • 4
  • 43
  • 58
utkarsh jain
  • 51
  • 1
  • 3
4

You can use Selection algorithm to find the number in the n/3 place and 2n/3.

n1=Selection(array[],n/3);
n2=Selection(array[],n2/3);
coun1=0;
coun2=0;

for(i=0;i<n;i++)
{
    if(array[i]==n1)
      count1++;
    if(array[i]==n2)
      count2++;
}
if(count1>n)
   print(n1);
else if(count2>n)
   print(n2);
else
   print("no found!");
One Man Crew
  • 9,420
  • 2
  • 42
  • 51
  • Note that this requires more than one pass over the data, so it doesn't satisfy the requirements of the original question. But neither does mine. – Steve Jessop Feb 19 '13 at 11:15
  • for(i=0;i>n;i++), maybe shoulbe i – Толя Feb 19 '13 at 11:17
  • 5
    @Steve Jessop linear time=O(N)===>Selection=O(N) 2*Selection + overThe Array=2*O(N)+O(N)=O(N)!!!! – One Man Crew Feb 19 '13 at 11:19
  • @OneManCrew: I agree, it runs in linear time. I was just pointing out that the original question called for a single-pass algorithm. The questioner has not made clear what "solve this problem" really means, is all, since the original problem called for single-pass but then this question further asks whether there is *any* linear time solution. – Steve Jessop Feb 19 '13 at 11:23
  • You may use 1 counter and use ++ and -- and compare with zero =) – Толя Feb 19 '13 at 11:53
  • @OneManCrew could you point out the specific algorithm that could be used here – Dubby Jun 26 '14 at 07:04
3

I use the following Python solution to discuss the correctness of the algorithm:

class Solution:
    """
    @param: nums: a list of integers
    @return: The majority number that occurs more than 1/3
    """
    def majorityNumber(self, nums):
        if nums is None:
            return None
        if len(nums) == 0:
            return None

        num1 = None
        num2 = None
        count1 = 0
        count2 = 0

        # Loop 1
        for i, val in enumerate(nums):
            if count1 == 0:
                num1 = val
                count1 = 1
            elif val == num1:
                count1 += 1
            elif count2 == 0:
                num2 = val
                count2 = 1
            elif val == num2:
                count2 += 1
            else:
                count1 -= 1
                count2 -= 1


        count1 = 0
        count2 = 0

        for val in nums:
            if val == num1:
                count1 += 1
            elif val == num2:
                count2 += 1

        if count1 > count2:
            return num1

        return num2

First, we need to prove claim A:

Claim A: Consider a list C which contains a majority number m which occurs more floor(n/3) times. After 3 different numbers are removed from C, we have C'. m is the majority number of C'.

Proof: Use R to denote m's occurrence count in C. We have R > floor(n/3). R > floor(n/3) => R - 1 > floor(n/3) - 1 => R - 1 > floor((n-3)/3). Use R' to denote m's occurrence count in C'. And use n' to denote the length of C'. Since 3 different numbers are removed, we have R' >= R - 1. And n'=n-3 is obvious. We can have R' > floor(n'/3) from R - 1 > floor((n-3)/3). So m is the majority number of C'.

Now let's prove the correctness of the loop 1. Define L as count1 * [num1] + count2 * [num2] + nums[i:]. Use m to denote the majority number.

Invariant

The majority number m is in L.

Initialization

At the start of the first itearation, L is nums[0:]. So the invariant is trivially true.

Maintenance

  1. if count1 == 0 branch: Before the iteration, L is count2 * [num2] + nums[i:]. After the iteration, L is 1 * [nums[i]] + count2 * [num2] + nums[i+1:]. In other words, L is not changed. So the invariant is maintained.

  2. if val == num1 branch: Before the iteration, L is count1 * [nums[i]] + count2 * [num2] + nums[i:]. After the iteration, L is (count1+1) * [num[i]] + count2 * [num2] + nums[i+1:]. In other words, L is not changed. So the invariant is maintained.

  3. f count2 == 0 branch: Similar to condition 1.
  4. elif val == num2 branch: Similar to condition 2.
  5. else branch: nums[i], num1 and num2 are different to each other in this case. After the iteration, L is (count1-1) * [num1] + (count2-1) * [num2] + nums[i+1:]. In other words, three different numbers are moved from count1 * [num1] + count2 * [num2] + nums[i:]. From claim A, we know m is the majority number of L.So the invariant is maintained.

Termination

When the loop terminates, nums[n:] is empty. L is count1 * [num1] + count2 * [num2].

So when the loop terminates, the majority number is either num1 or num2.

Jingguo Yao
  • 7,320
  • 6
  • 50
  • 63
2

At line number five, the if statement should have one more check:

if(n!=b && (cnt1 == 0 || n == a))
jww
  • 97,681
  • 90
  • 411
  • 885
1

If there are n elements in the array , and suppose in the worst case only 1 element is repeated n/3 times , then the probability of choosing one number that is not the one which is repeated n/3 times will be (2n/3)/n that is 1/3 , so if we randomly choose N elements from the array of size ‘n’, then the probability that we end up choosing the n/3 times repeated number will be atleast 1-(2/3)^N . If we eqaute this to say 99.99 percent probability of getting success, we will get N=23 for any value of “n”.

Therefore just choose 23 numbers randomly from the list and count their occurrences , if we get count greater than n/3 , we will return that number and if we didn’t get any solution after checking for 23 numbers randomly , return -1;

The algorithm is essentially O(n) as the value 23 doesn’t depend on n(size of list) , so we have to just traverse array 23 times at worst case of algo.

Accepted Code on interviewbit(C++):

  int n=A.size();
  int ans,flag=0;
  for(int i=0;i<23;i++)
  {

int index=rand()%n;
int elem=A[index];
int count=0;
for(int i=0;i<n;i++)
{
    if(A[i]==elem)
    count++;
}

if(count>n/3)
{
    flag=1;
    ans=elem;
}

if(flag==1)
break;
}

if(flag==1)
 return ans;
else return -1;
 }
vikash tiwari
  • 109
  • 1
  • 1
  • 9