What is the fastest way to get the frequency of numbers in an array in C++?

Question

My method creates an std::map<int, int> and populates it with the number and its frequency by iterating over the array once, but I'm wondering if there's a quicker way without using a map.

Using an `std::unordered_map` instead of a `std::map` probably answers your question (though I doubt it fits you'r intent). — Jerry Coffin, Jul 28 '21 at 21:11
Is there a limit to the range of the numbers? If so, you could perhaps use an array, `std::array`, or `std::vector`. — Fred Larson, Jul 28 '21 at 21:12
@FredLarson Not really, the numbers can be anything, but positive only. There's a bit of a dirty way to do this using a vector, by checking if the current number I'm iterating is different to a previously recorded number, and if so add a 1 to the vector, else increment current count in the vector. It just seems dirty. — Neosapien, Jul 28 '21 at 21:36
@JerryCoffin Not sure how that behaves differently than a map in this case... — Neosapien, Jul 28 '21 at 21:37
unordered_map should be faster because of the lack of ordering: [https://stackoverflow.com/questions/2196995/is-there-any-advantage-of-using-map-over-unordered-map-in-case-of-trivial-keys](https://stackoverflow.com/questions/2196995/is-there-any-advantage-of-using-map-over-unordered-map-in-case-of-trivial-keys) — drescherjm, Jul 28 '21 at 21:40
@Neosapien *if the current number I'm iterating is different to a previously recorded number* -- And therein lies the issue. How do you check if the number has been previously recorded? You've described what is done at a high-level, but didn't go into low-level details. What data structure(s) would you use to perform this task efficiently? — PaulMcKenzie, Jul 28 '21 at 21:46
Also, for vector, you would probably place the item in a sorted container after performing a binary search for the item (i.e. `std::lower_bound / std::upper_bound).` — PaulMcKenzie, Jul 28 '21 at 21:49
`unordered_map` isn't always faster. Deviant cases can turn some implementations into a glorified linked list. As always, profile. — user4581301, Jul 28 '21 at 21:50
*There's a bit of a dirty way to do this using a vector* -- The other issue with `map` is that it does not store its data in contiguous memory, unlike a vector. This is one reason why persons use a sorted vector over a map, thus it need not be considered "dirty" if using a vector. You could have a sorted vector of pairs to mimic a map. — PaulMcKenzie, Jul 28 '21 at 21:57
Without more knowledge from the actual data (5 numbers or 50 millions?), range and how it is used (once or "real time"), it is hard to tell and if the data is relatively small, it does not matter much so one would favor code readability. If the data is really big and you want to improve performance, then profiling is the right thing to do. Also, if you fill all data first then need count at then end only, a `vector` sorted at the end might make sense. — Phil1970, Jul 28 '21 at 22:21

463035818_is_not_an_ai · Accepted Answer · 2021-07-28T22:21:26.253

std::unordered_map<int,int> can count frequencies as well but its operator[] has complexity (cppreference):

Average case: constant, worst case: linear in size.

Compared to

Logarithmic in the size of the container.

with a std::map.

When the maximum number is small you can use an array, and directly count:

 for (const auto& number : array) counter[number]++;

Admittetly, all this has already been said in comments, so I'll also add this one: You need to measure. Complexity is only about asymptotic runtime, while for given input size a std::map can actually be faster.

score 1 · Answer 2 · answered Aug 02 '22 at 23:00

NOTE: ValueType, DifferenceType are defined to be

template <std::input_iterator I>
using ValueType = typename std::iterator_traits<I>::value_type;

template <std::input_iterator I>
using
DifferenceType = typename std::iterator_traits<I>::difference_type;

If the array is sorted, you can use std::equal_range to find the range of elements that is equal to x. With concepts you write:

// I2 is homomorphic to std::pair<I, unsigned>
// [first, last) is partially ordered with respect to I::value_type
// return value is d_first + |{x | x in [first, last)}|
// R is a relation over I, compare element using R
template <std::random_access_iterator I, std::forward_iterator I2,
std::relation<bool, ValueType<I>> R = std::less<ValueType<I>>>
requires(std::regular<ValueType<I>> &&
std::is_constructible_v<ValueType<I2>, I, DifferenceType<I>>)

I2 frequency_sorted(I first, I last, I2 d_first, R r = R())
{
  while(first != last)
  {
    auto [left, right] = std::equal_range(first, last, *first, r);
    *d_first = {left, std::distance(left, right)};
    ++d_first;
    first = right;
  }
  return d_first;
}

If you have limited resources, you can truncate the result and have:

// I2 is homomorphic to std::pair<I, unsigned>
// [first, last) is partially ordered with respect to I::value_type
// return value is a pair, where the first element is 
// the starting point of subsequence [first, last) where such
// subsequence is unevaluated
// the second element is 
// - d_last if |{x | x in [first, last)}| >= d_last - d_first
// - d_first + |{x | x in [first, last)}| if otherwise
template <std::random_access_iterator I, std::forward_iterator I2,
std::relation<bool, ValueType<I>> R = std::less<ValueType<I>>>
requires(std::regular<ValueType<I>> &&
std::is_constructible_v<ValueType<I2>, I, DifferenceType<I>>)

std::pair<I, I2>
frequency_sorted_truncate(I first, I last, I2 d_first, I2 d_last, R r = R())
{
  while(first != last && d_first != d_last)
  {
    auto [left, right] = std::equal_range(first, last, *first, r);
    *d_first = {left, std::distance(left, right)};
    ++d_first;
    first = right;
  }
  return {first, d_first};
}

These two functions allow you to pass in any relation, and the default comparison uses operator<.

If your array is unsorted, and the size of the array is large enough, then it might be a good idea to just sort the array and use the algorithm. Hashing might be tempting but it creates cache miss and might not be as fast as you would expect. You can try both methods and measure which one is faster, you are welcome to tell me the result.

My compiler version is g++ 11.2.11, I think the code can be compiled with a C++ 20 compiler. If you don't have one, simply replace the concepts part with typename, I think by doing that you will only need a C++ 17 compiler(due to structural binding).

Please tell me whether my code can be improved.

What is the fastest way to get the frequency of numbers in an array in C++?

2 Answers2