Find dominant mode of an unsorted array

Question

Note, this is a homework assignment.

I need to find the mode of an array (positive values) and secondarily return that value if the mode is greater that sizeof(array)/2,the dominant value. Some arrays will have neither. That is simple enough, but there is a constraint that the array must NOT be sorted prior to the determination, additionally, the complexity must be on the order of O(nlogn).

Using this second constraint, and the master theorem we can determine that the time complexity 'T(n) = A*T(n/B) + n^D' where A=B and log_B(A)=D for O(nlogn) to be true. Thus, A=B=D=2. This is also convenient since the dominant value must be dominant in the 1st, 2nd, or both halves of an array.

Using 'T(n) = A*T(n/B) + n^D' we know that the search function will call itself twice at each level (A), divide the problem set by 2 at each level (B). I'm stuck figuring out how to make my algorithm take into account the n^2 at each level.

To make some code of this:

int search(a,b) {
    search(a, a+(b-a)/2);
    search(a+(b-a)/2+1, b);
}

The "glue" I'm missing here is how to combine these divided functions and I think that will implement the n^2 complexity. There is some trick here where the dominant must be the dominant in the 1st or 2nd half or both, not quite sure how that helps me right now with the complexity constraint.

I've written down some examples of small arrays and I've drawn out ways it would divide. I can't seem to go in the correct direction of finding one, single method that will always return the dominant value.

At level 0, the function needs to call itself to search the first half and second half of the array. That needs to recurse, and call itself. Then at each level, it needs to perform n^2 operations. So in an array [2,0,2,0,2] it would split that into a search on [2,0] and a search on [2,0,2] AND perform 25 operations. A search on [2,0] would call a search on [2] and a search on [0] AND perform 4 operations. I'm assuming these would need to be a search of the array space itself. I was planning to use C++ and use something from STL to iterate and count the values. I could create a large array and just update counts by their index.

do u mean to find a value in the array which occurs most of the time. and if it occurs sizeof(array)/2 or more, return it? — songlj, Feb 07 '13 at 05:11
I don't understand what you're searching for or why you would recurse. — Gabe, Feb 07 '13 at 05:25
@Gabe, my guess is that he's looking for the mode of the array.... # that occurs most often based on the comment above. divide-and-conquer type recursion makes it faster. — thang, Feb 07 '13 at 05:27
@mike_b: you could iterate the array, incrementing a `map`, then iterate the `map` to find the highest count... will be `O(nlogn)` as that's the cost of map insertion, and the iterations are less than that anyway. — Tony Delroy, Feb 07 '13 at 05:32
I think that he's not supposed to use more memory. what you're suggesting (both of you) is basically radix sort first then determine the mode from there. **there is a constraint that the array must NOT be sorted prior to the determination**. — thang, Feb 07 '13 at 05:33
ok one more hint: use this fact *if the mode is greater that sizeof(array)/2* — thang, Feb 07 '13 at 05:37
@songlj - Yes, the dominant value means more than half the items in an array are that value. — mike_b, Feb 07 '13 at 05:46
@Gabe - that is the assignment, similar to merge sort complexity. You must recurse to get O(nlogn) — mike_b, Feb 07 '13 at 05:47
@thang - yes, this is based on master theorem and it part of divide and conquer algorithms. Also, I don't think finding the mode as a function will help me here. — mike_b, Feb 07 '13 at 05:48
@mike_b, so my point is that if you take the sizeof(array)/2 condition out, it becomes a lot more difficult... — thang, Feb 07 '13 at 05:49
here are all the hints I gave. this is really quite simple. **1**. what you observed: *the dominant must be the dominant in the 1st or 2nd half or both*, **2**. to determine wither a or b is a dominate value is linear time, **3**. linear time * log step = n log n, **4**. you need it to be *the dominant value*. if it were simply *the mode*, then the problem is much harder, and **5**. the base case is that the dominant value of an array of size 1 is that one number. — thang, Feb 07 '13 at 05:51
I think determining a dominant by counting would be easy and yes linear time. But if you do a linear search on each log step, that surely is not nlogn. Again, related to divide and conquer, this means step we divide the problem in half until we can't divide any more. Hence, we get to an n=1 size array, that is the linear search. So its actually, log base 2 n steps and 1*n linear searches when n=1 at step k=log base 2 of n. So divide first, and recursion to get the logn portion. — mike_b, Feb 07 '13 at 06:15
let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/24093/discussion-between-mike-b-and-thang) — mike_b, Feb 07 '13 at 06:17
@mike_b, if you do linear every log step, then **it is nlogn**. in fact, everytime you do linear it is k < n, then it's really < n log n depending on how far k is from n. — thang, Feb 07 '13 at 06:31
no, that's an nlogn at each step. At each step there are 2^i sets. There are n/(2^i) elements. Therefore there are (2^i)(n/(2^i)) steps, or simply n steps at search level. Look at the series, sum n from i=0 to logn. This is n + nlogn or O(nlogn) by itself. That with logn steps is O(nlog^2(n)). I made a mistake above, I said A=B=D, but log base 2 of 2 is 1, hence A*T(n/B) + n^D -> A=B=2, D=1. T(n) = 2T(n/2) + n. I'm still not sure how to prove my algorithm for the n part except to say the series sum n/(2^i) from i to n is O(n). but I'm not sure how to get the n^d part in there. — mike_b, Feb 07 '13 at 08:34
other than at each step, n/2 is searched [T(n)=2T(n/2)], but also, n/2^i results are returned and compared, so I guess that's the n to satisfy the master theorem. — mike_b, Feb 07 '13 at 08:41
You're right about the nlogn, but it doesn't fit the master theorem, for that you also need the 2T(n/2) which makes linear search at each step log^2(n) when combined with 2T(n/2). — mike_b, Feb 07 '13 at 08:44

Gabe · Answer 1 · 2013-02-07T06:30:26.640

2

If you want to find just the dominant mode of an array, and do it recursively, here's the pseudo-code:

def DominantMode(array):
    # if there is only one element, that's the dominant mode
    if len(array) == 1: return array[0]
    # otherwise, find the dominant mode of the left and right halves
    left = DominantMode(array[0:len(array)/2])
    right = DominantMode(array[len(array)/2:len(array)])
    # if both sides have the same dominant mode, the whole array has that mode
    if left == right: return left
    # otherwise, we have to scan the whole array to determine which one wins
    leftCount = sum(element == left for element in array)
    rightCount = sum(element == right for element in array)
    if leftCount > len(array) / 2: return left
    if rightCount > len(array) / 2: return right
    # if neither wins, just return None
    return None

The above algorithm is O(nlogn) time but only O(logn) space.

If you want to find the mode of an array (not just the dominant mode), first compute the histogram. You can do this in O(n) time (visiting each element of the array exactly once) by storing the historgram in a hash table that maps the element value to its frequency.

Once the histogram has been computed, you can iterate over it (visiting each element at most once) to find the highest frequency. Once you find a frequency larger than half the size of the array, you can return immediately and ignore the rest of the histogram. Since the size of the histogram can be no larger than the size of the original array, this step is also O(n) time (and O(n) space).

Since both steps are O(n) time, the resulting algorithmic complexity is O(n) time.

edited Feb 07 '13 at 06:30

answered Feb 07 '13 at 05:32

Gabe

84,912
12
139
238

1

@mike_b: Big-O notation just gives an upper bound; all O(n) functions are also O(nlogn). In other words, the algorithm I gave also has O(nlogn) complexity. – Gabe Feb 07 '13 at 05:57
almost. 1. you need to check the case where left==none or right==none 2. you can run this in place, so O(1) in space. – thang Feb 07 '13 at 06:35
@thang: What would I need to do differently if `left==None` or `right==None`? And I don't see how you can recurse log(n) deep without using k*log(n) stack space. – Gabe Feb 07 '13 at 06:55
The instructor is implying that we need to do it without a simply linear search. He doesn't wan't something O(n). He wants a recursively called function, similar to merge sort. what you have now is similar to what I've developed. your right side also has overlap it should be len(array)/2+1. Are you counts scoped only to that instance of the function? if so, there are times when that will not work, the count for a value needs to be maintained across recursive calls. – mike_b Feb 07 '13 at 07:29
@Gabe, you can actually straighten it out, but you're right. I made a mistake. In that code, you do need log n in stack space. With respect to left and right, I checked again and you implicitly handled them. Pretty cool. +1. – thang Feb 07 '13 at 07:35
@mike_b: My notation `[x:y]` only includes elements `x .. y-1`, so there's no overlap. What do you mean when you say "the count for a value needs to be maintained across recursive calls"? – Gabe Feb 07 '13 at 07:39

score 2 · Answer 2 · answered Feb 07 '13 at 06:20

2

if some number occurs more than half, it can be done by O(n) time complexity and O(1) space complexity as follow:

int num = a[0], occ = 1;
for (int i=1; i<n; i++) {
    if (a[i] == num) occ++;
    else {
        occ--;
        if (occ < 0) {
            num = a[i];
            occ = 1;
        }
    }
}

since u r not sure whether such number occurs, all u need to do is to apply the above algorithm to get a number first, then iterate the whole array 2nd time to get the occurance of the number and check whether it is greater than half.

answered Feb 07 '13 at 06:20

songlj

927
1
6
10

@thang why u said im joking? is there anything wrong with my code? – songlj Feb 07 '13 at 07:10
the goal was to find a solution with O(nlogn) complexity. this requires recursion with log n steps. – mike_b Feb 07 '13 at 07:21
@songlj I think the algorithm does not work for some cases try for example, 1, 2, 1, 2, 1, 2, 1, 3, 3, 1 . sorry I am adding this as an answer because I cannot comment yet. EDIT: If we know that the mode appears at least N/2 times in the array This is the Boyer-Moore Majority Algorithm. This http://stackoverflow.com/questions/3740371/finding-the-max-repeated-element-in-an-array discusses this specific instance of mode finding. – argsv Jul 25 '14 at 20:21

Find dominant mode of an unsorted array

2 Answers2