4

Design and analyze a linear time algorithm to determine whether there exists an element in a list of n elements which repeats itself at least n/10 times in the list.

How can I do this? I'm posting my own idea as an answer.

jkdev
  • 11,360
  • 15
  • 54
  • 77
starcaller
  • 979
  • 3
  • 16
  • 25
  • 1
    I assume you are not looking for an *average case* linear time algorithm (with additional `O(n)` space)? If so - it can be done by creating a hash based [histogram](http://en.wikipedia.org/wiki/Histogram) of the elements and then iterating the histogram. – amit Nov 24 '12 at 15:38
  • @amit please check my solution to see if it's correct, thanks. – starcaller Nov 24 '12 at 15:53

3 Answers3

12

I assume the elements are comparable.

Perform a selection algorithm for the: n/10th, 2n/10th, ..., 9n/10th, 10(n/10)th smallest elements1

These are your candidates. Check the #occurrences for each of them, and if one of them repeats at least n/10 times the answer is true. Otherwise, it is false.

Proof:
If an element appears at least n/10 times, then it must "intersect" with k*n/10 for some k (in a sorted list)2. Thus, this element will be chosen as a "candidate", and you will later discover (by counting) exactly how many times it appears, and will return true.

If no element is repeated n/10 times, the algorithm will trivially return false in the last step of checking each candidate.

Complexity:
Each selection algorithm is O(n), and it is done 10 times. Also for each candidate requires linear scan on the list, which is also O(n) totaling in O(n) overall—but with terrible constants.


Explanations:

(1) Selection algorithm will find the element which would be in the index n/10, 2n/10,...9n/10 in a sorted list, and the algorithm itself is only O(n)

(2) Let's look at the example of [1,2,..,100,100,..,100] (11 times 100).
Note that the list is sorted and the element 100 appears in: list[9*n/10] (index 9*n/10). The idea of the selection algorithm is, that—even if you shuffle the list—select(list,9*n/10) will always return the same element—in this case 100—since it is the 9n/10th element in the sorted list (this is what the algorithm does).

Now, you can see that for each element (let it be e) that repeats n/10 times, there is some index i such that in the sorted version of the list, all elements in indices i,i+1,...,i+n/10 will be e. One of these indices must be one of k*n/10 for some k (convince yourself why!). Thus, the selection algorithm on k*n/10 will yield e.

amit
  • 175,853
  • 27
  • 231
  • 333
  • Thanks for the solution, but would you please explain a little bit more on "If an element appears at least n/10 times, then it must "intersect" with k*n/10 for some k (in a sorted list). " still not quite get the idea. and also to keep the question simple, let's assume the elements in the list are all integers. – starcaller Nov 24 '12 at 16:06
  • @starcaller: I editted the answer (see edit, especially section 2). Hope it is more helpful. – amit Nov 24 '12 at 16:13
  • @amit but what if we have an unsorted list. – starcaller Nov 24 '12 at 16:17
  • Pedantic note: This (really good) solution supposes the sought element must appear more than `n/10` times in the list. If it can also appear exactly `n/10` times, the `k*n/10` points might miss it. Then one would use `k*n/11` for the selections. – Daniel Fischer Nov 24 '12 at 16:21
  • @DanielFischer: You are correct, I first though we are looking for *more then* `n/10` - but from second read it seems not to be the case. Editting to also check the first and last elements. – amit Nov 24 '12 at 16:23
  • Or, If you just add the n-th element (aka maximum) to the candidates, that suffices too for exactly `n/10` occurrences. Ah, duh, should finish reading the comments before adding, you already had that. – Daniel Fischer Nov 24 '12 at 16:24
  • @starcaller: It does not matter, if the list was [100,100,...100,99,98,...,1] selection algorithm will yield the same result for checking the same innput (formally, for each `k` and for each `list`: `select(list,k) == select(shuffle(list),k)`) – amit Nov 24 '12 at 16:26
  • You do have a thing for dashes xD – phant0m Nov 24 '12 at 16:31
  • Clever! Haven't seen this approach before. Are there any other problems that can be solved efficiently like this, i.e. by looking what at a constant number of element positions would contain after sorting? – j_random_hacker Nov 24 '12 at 16:57
3

Let me tell you a clever single pass algorithm to find a majority element (one with frequency higher than n/2) and you should see how to adapt it:

best = null
count = 0

foreach x in list:
   if (count == 0)
      count++
      best = x
   else if (best == x)
      count++
   else
      count--

return best

If there is a majority element, the above algorithm will find it (one pass, constant space). Once you figure out how it works you will see how it can be adapted to the n/10 case.

Andrew Tomazos
  • 66,139
  • 40
  • 186
  • 319
0

My solution is to separate the list into 10/n groups, and each of the groups contains 10 elements, then perform a randomized selection sort on each group, this will take O(1)*O(n) time which is O(n).

Since to satisfy the requirements, the candidate element has to show in each of the n/10 groups, we can perform a scan for each of the 10 elements in the first group, which will take 10*O(n) time.

So the overall time for the algorithm is O(n)+10*O(n), which is still O(n).

But this will not work if the elements in the groups are something like following:

1,2,3,4,5,6,7,8,9,10
11,11,11,11,11,11,11,11,11,11
...
11,11,11,11,11,11,11,11,11,11

My algorithm will return that no such element exist while actually 11 is the element which appears more than n/10 times.

phant0m
  • 16,595
  • 5
  • 50
  • 82
starcaller
  • 979
  • 3
  • 16
  • 25
  • 1
    You claim "the candidate element has to show in each of the n/10 groups", but it could be missing from some and appear multiple times in others. Also I think you mean "n/10 groups", not "10/n groups". – j_random_hacker Nov 24 '12 at 15:56
  • 1
    `the candidate element has to show in each of the n/10 groups` why? what about the list `[1,2,...,100,100,100,....,100]` (11 times 100). 100 won't appear in all lists, only in the last 2 lists – amit Nov 24 '12 at 15:57
  • @amit haha, just edited my solution. yea, that's the tricky part which has kept me thinking for a long time, any idea how to make my solution working? thanks – starcaller Nov 24 '12 at 16:00
  • @starcaller: (1) I tried to add my solution as an answer. (2) Your answer is wrong - so it will probably be downvoted. If you want it as a part of the question - you should edit and add this in the question itself - and not as an independent answer. – amit Nov 24 '12 at 16:06