1

I have a list of values, with some recurrences, e.g. : {1,2,2,3,3,3,3,7,8,1}

I want to store the unique values in this list in a data structure along with their counts.

    --------------
    |value |count|
    --------------
    |  1   |  2  |
    --------------
    |  2   |  2  |
    --------------
    |  3   |  4  |
    --------------
    |  7   |  1  | 
    --------------
    |  8   |  1  |
    --------------

which c++ standard library data structure will be the most efficient in doing so?

edit: i wont be modifying the structure in any way, i just want to know the count as the count will help me determine the output to a programming question.

  • 2
    A vector of pairs, or a pair of vectors, depending on your most common access pattern. But generally either would be OK. – Konrad Rudolph Jul 08 '20 at 10:05
  • 2
    A `std::map` or a `std::unordered_map` could be used to – Damien Jul 08 '20 at 10:06
  • 2
    *which c++ stl data structure will be the most efficient in doing so* -- You only listed 10 numbers. In your real application, how many unique numbers are you expected to have? A thousand? A million? – PaulMcKenzie Jul 08 '20 at 10:07
  • what is the most efficient depends on what operations you intend to use on the data structure. Different containers make different tradeoffs between insertion / lookup / removal / etc. Please clarify what you mean with "efficient" – 463035818_is_not_an_ai Jul 08 '20 at 10:08
  • 1
    @Damien Both of these structures are much less efficient. Unless you need the added functionality they provide, don’t use them. – Konrad Rudolph Jul 08 '20 at 10:08
  • 1
    @KonradRudolph It is difficult to discuss about efficiency if we don't know how this structure will be used. For example, a `std::map` is often used as a simple way to get the count of each value: `m[x[i]]++; `. – Damien Jul 08 '20 at 10:13

2 Answers2

6

First note that asking for "the most efficient" data structure is not a proper characterization of your requirements. Do you want a solution which:

  • is the fastest to use? In what use cases?
  • takes up the least amount of memory?
  • is the most maintainable/readable?
  • is the least bug-prone?
  • is the fastest to write?
  • exists alongside the original data structure (for the uncounted values), or in its stead?

You see, there are different kinds and aspects to efficiency.

Having said that, you could try:

  • A simple and straightforward solution was suggested to you by @songyuanyao and @RahulGupta: Use a map - an std::map if you want to interate your value-counts in increasing order, or an std::unordered_map if you don't care about the order. This will be easy to write and maintain, and kind-of-ok in terms of the time of inserting or removing an element. Still, both of these map structures are quite slow, so you might rethink whether you even want a standard-library map implementation.

  • An alternative solution - which is more efficient in terms of space and time if you perform a lot of reads and few inserts/updates - is what @KonradRudolph suggested in a comment: std::pair<std::vector<value_type>, std::vector<count_type>> or std::vector<std::pair<value_type, count_type>>; and make sure the count_type is large enough that you don't exceed it, but as small as it can be to reduce the amount of time you need for reading the whole structure. These will use a lot less space than the maps, since there are no bucket-lists, no empty

    Note that the choice between a vector-of-pairs or a pair-of-vectors is a common dilemma in data structure design, and is also known as "structure of arrays vs array of structs", or SoA vs AoS. See a concrete example here on the site and there are many others. AoS is better when you usually access both fields, and need the corresponding values together; SoA is better when you often need just one field (e.g. you want to sum up the counts between some range of values; or you want to obtain the set of all prime values etc.) This also relates to the architecture of databases - row-oriented vs column-oriented, with the former being more appropriate for transaction processing and the latter for analytic workloads.

einpoklum
  • 118,144
  • 57
  • 340
  • 684
  • 1
    @ThePhobiCeron: "Thank you" messages are redundant on StackOverflow - the way we express this sentiment is by upvoting, and by accepting if the answer resolves the problem. Nothing more is necessary... (PS - I know that there's now a "thanks" widget, but this is [an unpopular experiment](https://meta.stackoverflow.com/q/398367/1593077) which will hopefully go away.) – einpoklum Jul 08 '20 at 12:05
0

You can use a map in c++ declaration can be done as

map<int,int>map_name;

for inserting you can run a loop

for(auto itr:list_name)
    map_name[itr]++;

for(auto c:map_name)
cout << c.first << " " << c.second << endl;