If, as you suggest, you are looking for interesting or optimal algorithms to solve this problem, rather than the way to do it in Python (which is surely to use collections.Counter), a good first step is to google the name of the problem plus "Knuth".
Searching "Knuth order values by frequency" led me to a "Programming Pearls" article in the June 1986 Communications of the ACM, with a literate program by Knuth to print the k most common words in a file in order of frequency.
This particular article was memorable because, after Knuth presented an 8-page elaborately documented solution using custom data structures and clever algorithms, Doug McIlroy critiqued it with a 6-command Unix pipeline to do the same thing, namely
tr -cs A-Za-z '\n' | tr A-Z a-z | sort | uniq -c | sort -rn | sed ${1}q
which transliterates non-alphabetic characters to newlines, squeezing out multiple newlines; changes uppercase to lowercase; sorts; replaces repeated words with a single representative and a count; sorts in reverse order numerically; and outputs the first ${1} lines.
Anyway, what algorithm did Knuth use? He invented a new intricate data structure for the purpose, the hash trie. But the trie properties are useful specifically for strings (it organizes them by prefix). For general data as in your question, using a hash map is more applicable, and is not substantially different from the hash trie.
After scanning through the input and putting each item in the hash trie, or incrementing its count if already present, the trie has to be sorted by frequency. Knuth makes an array of 200 pointers, with the i-th one leading to a linked list of all the words with frequency i, and then an insertion-sorted list of all the words with frequency above 200. Some shenanigans are done to re-use the structure of the trie for these lists, so that additional storage isn't needed.
This is based on his assumption that "almost all of the frequency counts will be small."
A more general method would be:
- Keep track of the maximum count seen while you are adding items
- Then use max_count instead of 200: make a list L of max_count lists, and go through your hash_map and add each item to L[c], where c is its frequency.
Indeed, in the July 1987 issue, there was a follow-up with yet another literate program doing the same thing, by David Hanson, this one in C with loom (Knuth's was written in Pascal, with WEB.) This one uses a hash map to track the counts, then adds each item with frequency c to a list in L[c]. It wastes a bit of space by making L as long as the total number of words seen, and not determining max_count until populating L, instead of while putting counts in the hash map.
Besides implementation details, the scheme of things is the same as your idea #1: increment the counts in a dictionary; then sort by count.
So it's a pretty good bet that this is the optimal strategy.