MultiMap Data Structure for constant time ranking

Question

let me just say this is a HW project from UC Berkeley, so If you could give me a hint instead of the solution, It'd be great.

The problem is to create a data structure called "YearlyRecord" that maps a word to the number of times it is encountered in an ngram file. Here are the restrictions.

getting the count must be O(1)
getting the number of mappings must be O(1)
"put" must be on average O(log n)
getting the "rank" must be O(1)
counts, which provides a data structure of all of counts in increasing order
words, which provides a data structure of all words, in the order of counts

Note: "rank" is a function that takes a word and tells me its rank (e.g. 1, 2, 3, 4), which is based on the number of occurrences we have of it.

My solutions so far:

Hashmap (word -> count) provides O(1) getCount operation and O(1) size operation
make a private inner class that implements Comparator which compares based on the number of occurrences in the hashmap. Then use this comparator as input to a TreeSet of the words. This provides the "words" collection, which we can return in O(1) time.
Make another TreeSet of the counts. This provides an O(1) return of the counts in increasing order.
rank: Here is where I am stuck. It is clear that rank should be a map as well. I have the words in the TreeSet in increasing order of rank, but no indices to map to.

I don't see myself retrieving anything from it. "words" and "counts" gets the whole collection, that is O(1) because I just return the whole set. As far as O(lgn) "put," that is perfectly ok, I am actually constrained by a O(lgn) "put" on YearlyRecord — Sergey Kojoian, May 30 '15 at 16:32
OK, I think I see what you mean. But you were not actually constrained on the performance of `counts` and `words`, were you? — RealSkeptic, May 30 '15 at 16:46
no, the only constraints were on count, size, rank. The explicit ones at least. Making "counts" and "words" constant was my initiative. — Sergey Kojoian, May 30 '15 at 17:08
just as a reminder, the retrieval of the count of a word is done through the hashmap — Sergey Kojoian, May 30 '15 at 17:10
Yes, but when you return such data structures, you should return a *copy* of your internal structure, or whoever uses the reference can change it, and then you will give a bad data structure next time. — RealSkeptic, May 30 '15 at 17:12
yea I know returning a view is a problem. I wanted to take this one step at a time, finish the rank problem, have it run as fast as I can, then go back and deal with the view. At this point, we can safely assume nobody is going to mess with the view — Sergey Kojoian, May 30 '15 at 17:16
How do we determine the rank of a word if several words have the same number of occurrences? — kraskevich, May 30 '15 at 18:35
possible duplicate of [Design a system to keep top k frequent words real time](http://stackoverflow.com/questions/21692624/design-a-system-to-keep-top-k-frequent-words-real-time) — rici, May 31 '15 at 16:20
@SergeyKojoian: The answer to the linked question provides the outline of an algorithm which should help solve your problem. — rici, May 31 '15 at 16:22

MultiMap Data Structure for constant time ranking

0 Answers0