My teacher in my Data Structures class gave us an assignment to read in a book and count how many words there are. Thats not all; we need to display the 100 most common words. My gut says to sort the map, but I only need 100 words from the map. After googling around, is there a "Textbook Answer" to sorting maps by the value and not the key?
-
1`std::map`'s keys are already sorted. You'll be damned if they aren't. Also, I think `std::multiset` is a better container for your problem if you're storing the words redundantly. – Mark Garcia Jul 02 '13 at 02:56
-
2possible duplicate of [Elegant ways to count the frequency of words in a file](http://stackoverflow.com/questions/4888879/elegant-ways-to-count-the-frequency-of-words-in-a-file) – jogojapan Jul 02 '13 at 02:56
-
@MarkGarcia The way I understand the question, what is needed is to sort the words by frequency (in order to determine the 100 most common ones). A `std::map` (where the words are the keys) does not do this implicitly. – jogojapan Jul 02 '13 at 02:57
-
1@Mark Garcia, Maybe you read to quickly... I need to sort the VALUES not keys. – Ryan Allred Jul 02 '13 at 02:58
-
The way I decided to go about the problem is to create another map once the book is read and flip the values so the frequency is the key and word is the value. It just seems like a waste of memory. – Ryan Allred Jul 02 '13 at 03:00
-
"create another map" - that's easy in terms of simplicity and brevity of code, and likely a lot faster than the book parsing anyway so won't affect the program's overall big-O efficiency. Looking at that step in isolation though, it's significantly more memory and CPU efficient to copy the first 100 words/counts to a vector, sort them, then iterate over the remaining words/counts: iff greater than the lowest in the vector do an in-order insertion, dropping the last value. – Tony Delroy Jul 02 '13 at 04:06
-
possible duplicate of [Algorithm: A Better Way To Calculate Frequencies of a list of words](http://stackoverflow.com/questions/10200806/algorithm-a-better-way-to-calculate-frequencies-of-a-list-of-words) – Jerry Coffin Jul 04 '13 at 03:04
1 Answers
I doubt there's a "Textbook Answer", and the answer is no: you can't sort maps by value.
You could always create another map
using the values. However, this is not the most efficient solution. What I think would be better is for you to chuck the values into a priority_queue
, and then pop the first 100 off.
Note that you don't need to store the words in the second data structure. You can store pointers or references to the word, or even a map::iterator
.
Now, there's another approach you could consider. That is to maintain a running order of the top 100 candidates as you build your first map. That way there would be no need to do the second pass and build an extra structure which, as you pointed out, is wasteful.
To do this efficiently you would probably use a heap-like approach and do a bubble-up whenever you update a value. Since the word counts only ever increase, this would suit the heap very nicely. However, you would have a maintenance issue on your hands. That is: how you reference the position of a value in the heap, and keeping track of values that fall off the bottom.

- 60,864
- 6
- 61
- 103