I have the got the following problem - I am given a text file containing huge number of words with repetition allowed. I need to write an algorithm that will output those 1000 words that appear most with their frequencies in decreasing order. Here is an example
**input.txt**
aa aa bb cc bb bb bb dd dd
**output.txt** (note - frequencies can be repeated)
bb 4
aa 2
dd 2
cc 1
And here is my way of solving this problem. First read all the words and store them in
HashMap
with word as the key. The end result of doing this is that I have all the words with their frequencies.Now I iterate over the
HashMap
and create an object {word, frequency} for each key value pair and then I insert it in aSortedSet
(I wrote a comparator for that).Now I just iterate over the
SortedSet
for 1000 times to get the result.
I would like to know a better way of solving this problem.
Thanks