given a stream of unknown size of words I need to find the k most frequent words at any given time, and the program should be able to run for a long time.
I've seen a post that is similar to the solution that I'm thinking of
The Most Efficient Way To Find Top K Frequent Words In A Big Word Sequence
How to find the most frequent element in a stream of numbers with additions and deletions allowed
but my problem is how I handle out of memory issue
in the posts, they don't mention this issue.
so I'm planning to store a hash with key as a word and value as word-frequency
and a min heap of size k that will hold the most frequent words
and for each word, I'll increase the count in the hash and check if I can enter it to the heap
the issue is that
the program should be able to run for a long time so there is a possibility that I'll get different words with a small frequency
and I need to save them in the hash just in case that the frequency will increase later on the stream
and this will create a memory problem because with time the hash can really big
I wonder if the only solution is to delete certain words
(but this will change the count)
I'm not sure how to handle that, any suggestion?