I would like to show to you a "more-modern" C++ approach by using existing algorithms from the standard C++ library.
We will also use a standard solution for this kind of question. Because, the basic, underlying question is, how to sort an associative container, like std::map
or std::unordered_map
by its value, and not by its key.
Sorting by value is of course not possible for these containers. Reason:
- It is the basic property of a
std::map
that it is sorted by its key values.
- And, the
std::unordered_map
, has the logical property that it is unordered and, addionally, has a "ForwardIterator" and not a "RandomAccessIterator", as required by sort-functions.
The answer and the standard solution is, to put the data into a different container and sort this instead.
I will now show you the code. The code contains your function "printFrequency" in a "modern-C++" style. Later, I will explain the code in detail, line by line.
#include <iostream>
#include <string>
#include <algorithm>
#include <regex>
#include <map>
#include <vector>
#include <utility>
#include <iomanip>
// Some test data
std::string testString
{ R"(Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy
eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam
voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet
clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.
Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy
eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua.
At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren,
no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet,
consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et
dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo
dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est
Lorem ipsum dolor sit amet.)" };
// We want to output this number of lines
constexpr size_t NumberOfOutputLines = 20U;
// Some abbreviation for our pair to store a word and its associated counter
using CounterType = std::pair<std::string, size_t>;
// Enhance our pair with 2 new functionalities: Output and comparison (for sorting)
struct CTO : public CounterType {
friend std::ostream& operator << (std::ostream& os, const CTO& co) {
return os << std::setw(30) << co.first << " --> " << co.second << '\n';
}
friend bool operator < (const CTO& p1, const CTO& p2) { return p2.second < p1.second; }
};
// Regex Helpers
// Regex to find a word
static const std::regex reWord{ R"(\w+)" };
// Result of search for one word in the string
static std::smatch smWord;
void printFrequency(std::string& data) {
// Here we will store the result. The word and the associated "count"
std::map<std::string, size_t> counter{};
// Iterate over all words in the string with a simple for loop and increment counter
for (std::string s{ data }; std::regex_search(s, smWord, reWord); s = smWord.suffix())
counter[smWord[0]]++;
// Copy result into a vector and sort it
std::vector<CTO> countedWords(counter.begin(), counter.end());
std::sort(countedWords.begin(), countedWords.end());
// Handle special case, where there are less counts available than expected
const size_t oCount{ std::min(countedWords.size(),NumberOfOutputLines) };
// Output sorted result
std::copy_n(countedWords.begin(), oCount, std::ostream_iterator<CTO>(std::cout));
}
// Driver code
int main() {
printFrequency(testString);
return 0;
}
First, some words to the logical design of the function. Also for counting, we will use the standard approach with a std::map
or std::unordered_map
, use their index operator to either create or retrieve a key-value-pair and increment the value, so the counter.
For iterating over all words in the input string, we will use a std::regex
. This will give us the maximum flexibility and is very easy to use.
Ok, then, we start with defining a constexpr
to specify the number of requested output lines.
We define an alias for our "Counter Type", for easier typing and better understanding. This is a std::pair<std::string, size_t>
, a std::string
for the word and a size_t
for the counter part.
Because the std::pair
has no default output function and "less-than" operator needed for sorting, we create a new customized "CounterType" with the name CTO by deriving from our std::pair
and add an output (inserter) operator and a less-than operator.
With that, we decouple functionality from the main software part and associate it with its type. That is the object oriented approach.
For the output operator (the inserter operator), we simply write the 2 parts of the std::pair
in a nice way to the output stream. No big deal.
In the less-than operator, we compare p2 wit p1 (and not vice versa). With that we reverse the sorting direction and get values sorted from big to small.
For the regex we use the easiest definition for a word, which is \w+, so, one or many alpha numeric characters. In the std::smatch
we store the result of the std::regex_match
or std::regex_search
function.
In the function body we first define our "counter" variable, here a std::map
. Words will be associated with the count of their occurence. I use a std::map
here, because I do want to have a sorted output for values having the same count.
Then, we want to iterate over all words of the input string. For iterating over "things" in C++, we usually use a for loop. First, we initialize the "loop"-variable with our input string, then, we search for the next word (and continue, until no more word is available). Last, we point to the next position in the input string after the previously found word. So, a classical for loop. Please check the functionality of ```std::regex_search```` here.
In the loop body, we use the std::map
's index operator. It will first either create a new word with default count 0 (if there was no such entry yet), or, it will search for an existing entry and find it. In any case, a reference to the value for the word is returned. That will be incremented. And so, we count occurences of words.
Now we will define a variable "countedWords" of type std::vector
. We will use its "range"-constructor. See here, constructor number 5. This will define and initialize our variable with the contents of the std::map
. Next, we will sort the std::vector
and this will work, because our custom data type has a less-than-operator.
Basically now everything done, and just output is needed. We want to copy a certain number of results to std::cout
. But it may be that there are less elements in the std::vector
than we want to output. So we will take the minimum value of elements in the std::vector
and the requested number and proceed with that.
For the output, we will use std::copy_n
. It will take elements in the std::vector
as source and send them to std::cout
with the help of the std::ostream_iterator
. The std::ostream_iterator
will simply call the above defined inserter operator for the elements in the std::vector
At the end, we define a simple "main" function as driver code and use some test string as input data.
The result is a compact and easy to understand function. Straight forward and using "modern-C++"
Of course there are millions of other possible implementations.
If there are any questions, then I am happy to answer.