I have a running stream of integers, how can I take largest k elements from this stream at any point of time.
-
See [this](http://cs.stackexchange.com/questions/35993/keep-kties-largest-elements-in-a-stream) and [this](http://stackoverflow.com/questions/10657503/find-running-median-from-a-stream-of-integers) – Karup Jun 18 '15 at 12:06
3 Answers
Easiest solution would be to populate a min-heap of size k
.
First, populate the heap with the first k
elements.
Next, for each element in the stream - check if it is larger than the heap's head, and if it is - pop the current head, and insert the new element instead.
At any point during the stream - the heap contains the largest k
elements.
This algorithm is O(nlogk)
, where n
is the number of elements encountered so far in the stream.
Another solution, a bit more complex but theoretically better in terms of asymptotic complexity in some cases, is to hold an array of 2k
elements.
First, load the first 2k elements.
Run Selection Algorithm, and find the highest k
out of them. Discard the rest, at this point you have only k
elements left in the array.
Now, fill the array again with the next k
elements, and repeat.
At each point, the array contains the k
largest elements, and up to k
more elements that are not the largest. You can run Selection Algorithm for each query on this array.
Run time analysis:
Maintaining the array: Each selection algorithm is O(2k) = O(k)
. This is done once every k
elements, so n/k
times if n
indicates the number of elements seen so far, which gives us O(n/k * 2k) = O(n)
.
In addition, each query is O(k)
, if the number of queries is Q
, this gives us O(n + Q*k)
run-time.
In order to this solution to be more efficient, we need Q*k < nlogk
Q*k < nlogk
Q < n/k * logk
So, if number of queries is limited as suggested above, this solution could be more efficient in terms of asymptotic complexity.
In practice, getting top k is usually done by using the min-heap solution, at least where I've seen the need of it.

- 175,853
- 27
- 231
- 333
-
Am i right to think that although this might be the easiest solution, depending on languages the "optimal" solution may vary ? I don't have anything in mind, just thinking out loud. – Laurent S. Jun 18 '15 at 12:08
-
@Bartdude See edit, added another solution that might prove even more efficeint in terms of asymptotic complexity. – amit Jun 18 '15 at 12:20
-
@amit, you might want to edit your answer - "check if it is smaller than the heap's head" should really be "check if it is larger than the heap's head". Smaller ones should be ignored. – stojke Jun 15 '16 at 12:58
-
-
The second approach is pretty interesting. I'll have to update my blog entry, http://blog.mischel.com/2011/10/25/when-theory-meets-practice/, to include that. Thanks! – Jim Mischel Jun 16 '16 at 13:46
This problem is also called Heavy Hitters . Count Sketch Data Structures are the solution for this .
Ref :

- 1,104
- 12
- 20
import heapq
def klargestelements(arr1,k):
q=heapq.nlargest(k,arr1)
return q
k=3
arr1=[1,2,4,5,6,7]
m=klargestelements(arr1,k)
print(m)
nsmallest or nlargest methods takes in the argument k and the array in which the min/max elements are to be found

- 598
- 5
- 16