Determine if more than half the keys in an array of size n are the same key in O(n) time?

Question

You have an array or list of keys of known size n. It is unknown how many unique keys there are in this list, could be as little as 0 and up to and including n. The keys are in no particular order and they really can't be, as these keys have no concept of greater than or less than, only equality or inequality. Now before you say hash map, here's one more condition that I think throws a wrench in that idea: The value of each key is private. The only information you can get about the key is whether or not it is equal to another key. So basically:

class key{
    private:
        T data;
        ...
    public:
        ...
        bool operator==(const key &k){return data==k.data}
        bool operator!=(const key &k){return data!=k.data}
};

key array[n];

Now, is there an algorithm that can determine if more than half of the keys in the array are the same key in linear time? If not, what about O(n*log(n))? So for example say the array only has 3 unique keys. 60% of the array is populated with keys where key.data==foo, 30% key.data==bar and 10% key.data==derp. The algorithm only needs to determine that more than 50% of the keys are of the same kind (keys with data==foo) and also return one of those keys.

According to my professor it can be done in O(n) time but he says we only have to find one that can do it in O(n*log(n)) time.

Yes, this can easily be done in O(n) time, using the SOP technique. Since this is for a class, I assume that you will want to figure this out for yourself? — RBarryYoung, Feb 13 '13 at 04:46
mike: you can do it in O(1) space and O(N) time, by the way. — rici, Feb 13 '13 at 05:23
@thang, there's no answer in here worth up-voting on in my opinion. — Alain, Feb 13 '13 at 05:41
@Alain, I don't understand where that statement comes from... I meant that as an answer to Mike's question. — thang, Feb 13 '13 at 05:42
@thang, I interpreted your comment "how bout voting" as pressuring the asker to vote on one of the existing solutions. Is this not what you meant? — Alain, Feb 13 '13 at 05:43
@Alain no I meant that as an answer to Mike's question. I have something in mind to do it... it's a merge of all the hints I gave, runs in O(1) space, O(n) time, and the best name I can give is voting — thang, Feb 13 '13 at 05:44
I see - rici's link clears things up as well. I wasn't aware of such a thing as a "Voting Algorithm" — Alain, Feb 13 '13 at 05:46
@Alain, actually I didn't know it either... now I do. Pretty cool though. — thang, Feb 13 '13 at 05:47
Hashing is the dead-simple answer, it's O(n) space and O(n). As it's incredibly simple and also the answer to about a zillion other programming questions/problems, I'd assume that that's what your professor was alluding to. If you want to cut the space down to O(1), then you could look at the much more specific (and tricky) Boyer-Moore Voting algorithm. — RBarryYoung, Feb 14 '13 at 15:39

MBo · Accepted Answer · 2013-02-14T16:59:32.613

5

If you can extract and hold any key for further comparisons, then the Boyer-Moore Majority Vote Algorithm is your friend.

edited Feb 14 '13 at 16:59

answered Feb 13 '13 at 05:49

MBo

77,366
5
53
86

1

It seems like this algorithm will give the **wrong** answer if there ISN'T a majority. For instance 'AABBC' will return `C:1`. 'CCCBBCA' will also return `C:1`. In the latter case, it's returned the majority letter, which is the purpose, but since we don't know going in whether or not there **is** a majority key, we have no way of knowing that the former result is incorrect. – Alain Feb 13 '13 at 05:53
From the site: "But if you must confirm that the chosen element is indeed the majority element, you must take one more linear pass through the data to do it." Also, he wants an answer that will detect if *at least* half the keys are the same. So 'CCCCBBAA' should identify 'C' as being at least half the keys, but this algorithm would return `?:0` - false negative. Not downvoting though, because it's a step in the right direction, – Alain Feb 13 '13 at 05:56
Algorithm gives right answer, if we know that majority element does exist, otherwise we should check if the answer is real majority - with the second pass. – MBo Feb 13 '13 at 05:59
@Alain: once you've figured out the candidate, you do a second pass where you count occurrences. Note that if you have an even number of keys and one of them occurs exactly half the time, then one of the two following statements is true: (1) the key which occurs half the time is the majority key at the second last position in the array. (2) the key which occurs half the time is the last key in the array. – rici Feb 13 '13 at 05:59
@rici - true, I tripped up on the fact that (2N) time is O(N). Still, I don't see how we get around the false negative in the case where a key has *exactly* 50% and results in a `?:0` output. – Alain Feb 13 '13 at 06:01
@Alain: Author just edited the question with "more than half the keys" – MBo Feb 13 '13 at 06:02
@Alain: read what I just wrote more carefully. Note that checking exactly two candidates also requires O(1) space and O(N) time. – rici Feb 13 '13 at 06:02
@MBo, perhaps the author made a mistake, but he did explicitly say "at least half" both in the title and question body. As for the exactly 50% problem, I don't see it being solved by making exactly three passes (checking 2 candidates). There's nothing about the output `?:0` that lends itself to a straightforward followup check for exactly half the keys matching. There could be 51 distinct keys out of 100 and one that has 50 instances. No way of knowing which repeats 50 times given the output. – Alain Feb 13 '13 at 06:03
Yes this is the answer. I have since edited the question to say more than half. Sorry about that. – Mike Feb 13 '13 at 06:08
1

@Alain: The 50% algorithm: Q: You have an array of 2N elements. N of them are the same. Find it. A: Either the last element in the array is the 50% element or it isn't. If it isn't, then there are N identical elements in the first 2N-1, and the voting algorithm will find it. If it is, the last element is the 50% element. So: run the voting algorithm on the first 2N-1 elements (all but the last) then count instances of the resulting candidate in the array. If you don't find N instances of it, count instances of the last element. – rici Feb 13 '13 at 16:42
Hashing is the dead-simple answer, it's O(n) space and O(n). As it's incredibly simple and also the answer to about a zillion other programming questions/problems, I'd assume that that's what your professor was alluding to. If you want to cut the space down to O(1), then you could look at the much more specific (and tricky) Boyer-Moore Voting algorithm. – RBarryYoung Feb 14 '13 at 15:41

score 0 · Answer 2 · answered Feb 13 '13 at 08:42

If you don't want use BM algorithm, you could use the 2 following algorithm based on the same idea.

Algorithm a. At each run maintain a set S of M (a small part of N, for example 10) pairs element-counts, while going through array for each element:

1.1. If element E is in the set, increase count of the corresponding pair (E,count) -> (E, count+1)
1.2. If not, drop out element with minimal count and insert new pair (E,1)

If element has frequency F > 0.5 it will be in the set at the end of this procedure with probability (very roughly, actuallty much higher) 1 - (1-F)^M, at the second run calculate actual frequencies of elements in the set S.

Algorithm b. Take N series of length M of randomly picked elements of the array, select the most frequent element by any method and calculate frequency for it and middle frequency over series, the maximal error of frequency calculation would be
F (realFrequency) / sqrt(N). So if you get F_e* 1 - 1.0 /sqrt(N) ) > 0.5 then you find the most frequent element, if you get F_e(1 + 1.0/sqrt(N)) < 0.5 this element don't exists. F_e is estimated frequency.

score -2 · Answer 3 · answered Feb 13 '13 at 04:50

-2

One solution that comes in my mind is.. You will pick first element from this array, and traverse through the list and all the matching elements you put in a separate arraylist, Now you pick the second element from the original list and compare it with first if they are equal , leave this element and pick the next one. This could be a possible solution.

answered Feb 13 '13 at 04:50

Sunny Gupta

6,929
15
52
80

1

It will result in O(n^2) – Nir Alfasi Feb 13 '13 at 04:53

Determine if more than half the keys in an array of size n are the same key in O(n) time?

3 Answers3