I need to store top k most frequent elements in a stream. To estimate the frequency i use count-min-sketch algorithms. My stream is consist of keys(as string). So basically every time i encounter a new key in my stream, i calculate the frequency of current key so far by looking into the count-min-sketch data structure. However i am unable to store the top k most frequent keys.
My first idea was store them in a min-heap with fixed size of k. And i store [frequency, key] inside this min-heap with comparator comnpare frequency. So every time i get a frequency of a key, i try to see if the heap size is over k, if it is, then i compare frequency of current key with the top(smallest) frequency in min-heap, if my current key's frequency is larger then i pop the top, and insert my key into the heap.
However i realize the min-heap is not a set, which means it allows duplication. Let's say i have a very hot key, and i keep counting it in the stream, so every time i insert this [frequency, key] into the heap, eventually my heap will be full of the same key, only different frequencies.
So i am wondering is there a good way to store the top k distinct more frequent element in count-min-sketch.