13

So for example, the answer for the array:

1, 11, 3, 95, 23, 8, 1

would be 1, since all the other elements only occur once while 1 occurs twice.

A lot of the questions similar to this question that I've seen on stackoverflow ask to find the absolute majority (the answer occurs at least n/2 in an array of length n), or answer the question using sorting or a hash table. The former is not what I'm asking, and the latter is either too slow ( O(n log n) for sorting ) or uses too much memory ( O(n) for a hash table ).

Does such an algorithm exist? If not, is there a proof showing why it's impossible? Including a source would be nice.

weeb
  • 1,939
  • 4
  • 18
  • 29
  • How about a linear scan? It seems like you applied no thought to this. – Marcin Aug 02 '12 at 16:20
  • 2
    Can you explain what you mean by a linear scan, and no, this is not homework – weeb Aug 02 '12 at 16:21
  • 3
    possible duplicate of [How can we find a repeated number in array in O(n) time and O(1) space complexity](http://stackoverflow.com/questions/7370978/how-can-we-find-a-repeated-number-in-array-in-on-time-and-o1-space-complexit) – Marcin Aug 02 '12 at 16:22
  • O(1) space seems a big challenge – Razvan Aug 02 '12 at 16:23
  • @Felix A linear scan simply means to iterate in one direction, once over the list. See the linked question and answer. In short, the datastructure that meets the space criterion and enables a linear scan is a fixed-size array sufficiently big to hold the range of possible integers found in the input. – Marcin Aug 02 '12 at 16:26
  • Marcin, the answer that was accepted is most definitely not O(1) space, and it's the same idea as using a hash table and doing, as you said, a linear scan. However, the other two answers seem to answer my question, so thanks. If anyone has a proof though, that'd be much appreciated – weeb Aug 02 '12 at 16:27
  • It can't be done for sure. Look at the following array :A = {1,2,3,..,n,1,2,3,..,n....,1,2,3,...n}. – barak1412 Aug 02 '12 at 17:00
  • 4
    If your array contains ints, then you can do it in O(n) time and O(1) space. Sort the array using an in place binary radix sort. Then make a pass through the array to determine most common element. Radix sort requires 32 passes through array, so 33 passes total, which is a constant, so O(n). – hatchet - done with SOverflow Aug 02 '12 at 17:12
  • If an array of size _k_ is needed, where _k_ is the number of discrete values that can possibly be encountered, then that's O(_k_) space. For O(1), the space needed has to be constant for all possible inputs. – Sean U Aug 02 '12 at 17:55
  • http://stackoverflow.com/questions/4325200/find-majority-element-in-array – Petar Minchev Aug 02 '12 at 18:46
  • @PetarMinchev - that is a different problem (which element occurs > 50%). There is a great algorithm by Boyer for that problem http://www.cs.utexas.edu/users/moore/best-ideas/mjrty/index.html – hatchet - done with SOverflow Aug 02 '12 at 18:54
  • @hatchet - Yes, I know this algorithm. I just misread the question, lol. Anyway thank you for the link:) – Petar Minchev Aug 02 '12 at 19:38
  • If a hashmap takes too much memory, can we also assume you can't keep the entire input in memory? That is, the problem should be analysed in the extern memory or streaming models. Or do you just want to only spend O(1) additional memory? – Thomas Ahle Apr 09 '16 at 12:17

5 Answers5

2

If you want to have fixed space to find the most common element you need to have a maximum number of bits for an element. If you didn't, then large input arrays could have larger input numbers such that the bits to represent the number is bigger than your fixed space to store the result.

Suppose k is the length of the largest number you support. If you try to naively create an array of 2^k buckets to count occurrences of each number (counter sort) you could receive an array consisting of the same number, in which case your algorithm would end up needing log(n) space to store the sum.[*]

If we look at a simpler version of the problem - determine whether or not there are more 1's or 0's in an input, I think you need a stack to do this (you store how much 1 or 0 is leading by), and so constant space just isn't possible, even if we limit the input length to k = 1 bit in size.

Your problem is more general (k > 1, but still fixed), and would also need non-constant space, so it's not possible, as the question is worded.

[*] If you assume counters have O(1) space complexity, then you can take the counter sort approach, although by doing so you've placed an upper-bound on the maximum size of your input array (which may or may not be acceptable): In terms of k, the maximum number of bits for an input element of your array and in terms of c the maximum number of bits in your counter your array can have at most 2^k * 2^c elements (one of the counters would overflow otherwise on the next element). To address this, you could add a O(1) time step to decrement your counters so that the minimum value is always 0 after each element is processed if all counters are non-0, thereby making them relative instead of absolute. This takes O(1) time because if all are non-zero you only need to decrement O(2^k) = O(1) counters by 1 if you perform it on each element. While the algorithm can now process some arbitrarily large inputs, any input array that has a sub-array such that two values a and b are such that count(a) - count(b) > 2^c = max(counter) using a counter strategy will fail for some inputs. In fact a consequence of relying on a O(1) space complexity counter approach is that all arrays that start with 2^c + 1 identical elements cannot be handled by this algorithm.

Jesus is Lord
  • 14,971
  • 11
  • 66
  • 97
  • The question says that a hashtable is O(n) memory, so we must consider a counter to be O(1) memory. Hence at least the 0/1 case should be easy. – Thomas Ahle Apr 09 '16 at 13:11
  • @ThomasAhle I think OP meant that if I have an input array with `n` items then a hashtable, which stores whether or not I've seen a number, takes up to `n` items inside it and, in its final state (after any resizes due to going over its threshold), is `O(n)` memory. The `O(1)` "counter" is a equivalent to a bit saying whether or not I've seen it - not a counter saying how many I've seen... unless OP gave indication otherwise. – Jesus is Lord Apr 09 '16 at 13:20
  • Ok, a bit vector would take only O(n) bits. A hashtable will usually be nlogn, even if the values are bits, since it needs to store the keys for equality checking. (Maybe perfect hashing helps). But how does knowing if we've seen a value before help solve the problem? It seems to be we need to know the count. – Thomas Ahle Apr 09 '16 at 13:27
  • @ThomasAhle When you say "A hashtable will usually be nlogn..." what is your "n"? It doesn't help us solve the problem; we would need the count (or really just how much ahead something is in case it ends up falls behind later - you don't need to compute the actual count to compute the majority) - you're right - but I think OP probably didn't know what data structures would end up being useful or not for this problem; I think OP's point was to clarify with an example of something that was "against the rules". – Jesus is Lord Apr 09 '16 at 13:32
  • Yes, OP meant for the table to store the counts, and considered this O(n) memory. Hence one counter is O(1) memory. That's my point. – Thomas Ahle Apr 09 '16 at 13:57
  • Well if I can store a counter in O(1) = `c` space and `k` is the maximum length of a number, I should be able to just do a counter sort like in the second paragraph of my approach... Implicitly fixing the counter to `O(1)` in space is also fixing the maximum length of my input array can be, in this case to `O(2^k * 2^c)`. – Jesus is Lord Apr 09 '16 at 14:17
  • (`2^k` represents the number of distinct keys I can support in my hashtable and `2^c` represents the number of distinct counts each counter can represent.) I could maybe decrement all the counters so that they're relative instead of absolute as I go along (which decrementing incurs time - but I'm ignoring that temporarily because I'm just focusing on space, here) so that I'm only storing the relative differences between numbers, but then I'll still run into the problem of one number occurring `2^c` times more than another and not being able to represent that on my relative counter encoding. – Jesus is Lord Apr 09 '16 at 14:20
  • @ThomasAhle I usually try to answer what I think OP's meaning on SO. However after reading the comments on the question for the first time it seems OP found an acceptable solution to their actual problem meaning what's of value in the future is others reading the precise question and answers - OP is not important - so I will focus on what is contained in the question body (comments readers shouldn't provide clarification that isn't later incorporated into the question/answer). I've incorporated the possibility of counters like you mentioned in my answer, and I think it's in a good place now. – Jesus is Lord Apr 09 '16 at 15:37
  • @ThomasAhle Let me know if you have any potential concerns with my answer - otherwise I'll leave it. Good discussion! :) – Jesus is Lord Apr 09 '16 at 15:38
1

Use the idea from here:

How can we find a repeated number in array in O(n) time and O(1) space complexity

And apply a technique similar to counting sort. That is, create N bins (an array of size N), where N is the largest integer you expect to encounter. This is still O(1) space. Then, iterate through the original array in O(n) time, and when you encounter a value i, increment your results array at index i by 1. Then, iterate through the results array (again O(1) time), finding the largest single value. The index of that value will be the most common value in the original list.

Community
  • 1
  • 1
maxko87
  • 2,892
  • 4
  • 28
  • 43
  • Is that feasible? What if the input is an array with two integers { int.MinValue, int.MaxValue } ? n = 2, but you need to create an array of enourmous size. – hatchet - done with SOverflow Aug 02 '12 at 17:51
  • This is with the assumption (which would usually be true in practicality) that the size of the input is much larger than the maximum value of the input. Otherwise, you can do another pass through the array beforehand, and make sure that this isn't the case -- and use another algorithm if it is. This doesn't alter the runtime. – maxko87 Aug 02 '12 at 17:58
  • 1
    Your assumption seems limiting and unnatural. The only assumption you can make for the range of values is that they're bounded by limits of an int, regardless of the size of the array. – hatchet - done with SOverflow Aug 02 '12 at 18:00
  • I think the part I disagree with is that your approach is O(1) space. Given the limitation that inputs must be ints, the size of the array you create varies depending on the input. So it is not constant. The only way you can say it's constant (and thus O(1)) is if you say you always create an array of size equal to the number of discrete int values (4+ billion), which is not that useful. I don't think a workaround of "try something else for this input" helps unless you specify precisely what something else is. – hatchet - done with SOverflow Aug 02 '12 at 18:50
1

This is not a complete answer, but it should help shed some light on why this problem is difficult.

Consider we want to design an algorithm, that does one sweep over the array (in some order) to find the most common element. During the run of our algorithm, it is allowed to keep some data structure S. Let's see how much information there has to be in S, and thus if we can contain it in O(1) memory.

Say our algorithm has processed the first k elements of the array. Now S can tell us the most common element in the range a[0..k]. However, say we knew the k+1'st element, then we would also know the most common element in the range a[0..k+1]. If it couldn't, our algorithm wouldn't work if n was k+1. More generally, given knowledge of elements a[k..m] and S, we know the most common element in a[0..m].

We can use the above argument to extract information from S. Say we are working with integers in the range [0,u] (there has to be some range if the original array took space O(n)). If the original most common element is 5, then we add 0's until the most common element changes. If that took c zeroes, a[0..k] must have contained c more 5's than 0's. Repeating this argument we get a lot of linear equations which we can solve to tell exactly how many times each of the elements [0,u] were present in a[0..k].

This tells us that any data structure that does a sweep, might as well store the counts of all the seen elements (in some compressed way). If you're interested in the maths, the stored after seeing n numbers is log(n+u-1 choose n) which is the log of the number of ways to partition n indistinguishable items into u distinguishable bins. That's more than log(u^n/n!) >= nlogu-nlogn.

Conclusion: Any algorithm that does only one pass of the array will have to use as much memory as it takes to store all the counts seen so far. If n is small compared to u this corresponds to storing n words of memory.

(Well, instead of extra memory we might also overwrite the existing array).

There's a lot more to explore here. E.g. how multiple passes affect the above arguments. However I think I should stop at this point :), but it doesn't seem likely to me that any linear time algorithm, with a large u, will be able to get away with O(1) extra memory.

Thomas Ahle
  • 30,774
  • 21
  • 92
  • 114
  • I should mention how this relates to maxko87's solution. Maxko87 sweeps through the array and saves the counts in an array of size `ulogn` bits (`u` bins each able to count to `n`). If `n` is about the size of `u` his solution is fine. If `n` is smaller than `u` his solution can still be made efficient using a hashmap, however it won't be constant space. – Thomas Ahle Nov 16 '14 at 00:45
  • Here is another proof of the same thing http://theory.stanford.edu/~trevisan/cs154-12/notestream.pdf – Thomas Ahle Nov 16 '14 at 15:38
  • You can do more than one pass and still be O(n)... (comment for anyone - valuable information here in this answer, but know that assuming only 1 pass is stricter than the requirement of O(n), which is what the problem asks for) – Jesus is Lord Apr 09 '16 at 12:05
  • He says linear memory is too much. If he could do multiple passes, I'd assume he had to save the input. – Thomas Ahle Apr 09 '16 at 12:15
  • You bring up a good point I suppose if you're meaning it like that then binary search is O(n) not O(log(n)) because you have to read the input into your Turing machine... I was assuming OP had a pre-initialized external array that could be queried by index in constant time by some algorithm we're trying to design - therefore more than one pass is possible. – Jesus is Lord Apr 09 '16 at 12:24
  • 1
    Right, it all depends on the model. It's also important whether the input is then stored in readonly memory. Otherwise the radixsort approach would work. I don't know what practical situation that would correspond to though. – Thomas Ahle Apr 09 '16 at 12:30
  • Well if you could modify the input then that would seem like cheating the space requirement, but still needs to be specified like you said depends on the model. Modifying the input would seem impractical to me, as well - normally `O(n)` means proportionate to the input - if you could modify the input on external storage then you're using a form of `O(n)` storage but you're limited to just `n`, so it's weird. Readonly seems more intuitive because the question is can I predict the memory impact of this algorithm and know I won't need more RAM (I also didn't realize how old this question was.) – Jesus is Lord Apr 09 '16 at 12:37
  • 1
    To put it another way - destroying your input array implies multiple runs of your algorithm will need an array initialized in time and space, so it's not practical, like you said. Having pre-initialized readonly memory can exist in successive or parallel runs of an algorithm and so the initialization you do up front is a one-time thing. If I have some immutable data structure can I write small compact space and time algorithms to analyze it in parralel and over and over... – Jesus is Lord Apr 09 '16 at 12:47
  • Of course since Turing machines correspond to the integers I can make a lookup table for the first `T` Turing machines and what they will return (assume this can be represented in `R` bits) for inputs less than or equal to `M` and use this `O(T*2^M*R)` size block of memory to figure out what to do instead of actually running them... so you can't get carried away with this freebie of pre-initialized memory. – Jesus is Lord Apr 09 '16 at 12:53
0

There is a well-documented algorithm to do this, known as Boyer-Moore's majority vote algorithm.

Initialize an element m and a counter i with i = 0
For each element x of the input sequence:
If i = 0, then assign m = x and i = 1
else if m = x, then assign i = i + 1
else assign i = i − 1
Return m

It's so simple that it's somewhat hard to believe it's correct, IMO. I recommend reading the proof.

max
  • 716
  • 7
  • 22
-2

this is my script to read most common element in an array

<?php

class TestClass {

    public $keyVal;
    public $keyPlace = 0;

    //put your code here
    public function maxused_num($array) {
        $temp = array();
        $tempval = array();
        $r = 0;
        for ($i = 0; $i <= count($array) - 1; $i++) {
            $r = 0;
            for ($j = 0; $j <= count($array) - 1; $j++) {
                if ($array[$i] == $array[$j]) {
                    $r = $r + 1;
                }
            }
            $tempval[$i] = $r;
            $temp[$i] = $array[$i];
        }
        //fetch max value
        $max = 0;
        for ($i = 0; $i <= count($tempval) - 1; $i++) {
            if ($tempval[$i] > $max) {
                $max = $tempval[$i];
            }
        }
        //get value 
        for ($i = 0; $i <= count($tempval) - 1; $i++) {
            if ($tempval[$i] == $max) {
                $this->keyVal = $tempval[$i];
                $this->keyPlace = $i;
                break;
            }
        }

        // 1.place holder on array $this->keyPlace;
        // 2.number of reapeats $this->keyVal;
        return $array[$this->keyPlace];
    }

}

$catch = new TestClass();
$array = array(1, 1, 1, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 3, 1, 2, 3, 1, 1, 2, 5, 7, 1, 9, 0, 11, 22, 1, 1, 22, 22, 35, 66, 1, 1, 1);
echo $catch->maxused_num($array);
Tunaki
  • 132,869
  • 46
  • 340
  • 423