3

Given a std::vector<std::vector<int>>:

  • I want to output a sorted std::vector<std::vector<int>>
  • that contains only unique std::vector<int>
  • as well as the frequency (i.e., count) of these unique std::vector<int>

My question is twofold:

  • how can I efficiently achieve this relying only on the standard library?
  • what is causing the code below to perform poorly?

I tried the following:

std::vector<std::vector<int>> responseVectors 
{
    { 2,  3,  2 },
    { 3,  2,  3 }, 
    { 3,  2,  3 }, 
    { 3,  3,  3 }, 
    { 1,  2,  3 }, 
    { 1,  2,  3 }, 
    { 1,  2,  3 }, 
    { 2,  1,  2 }, 
    { 0,  1,  2 }, 
    { 2,  3,  2 },
    { 3,  2,  3 }, 
    { 3,  2,  3 }, 
    { 3,  3,  3 }, 
    { 1,  2,  3 }, 
    { 1,  2,  3 }, 
    { 1,  2,  3 }, 
    { 2,  1,  2 }, 
    { 0,  1,  2 }
};

std::vector<std::vector<int>> uniqueResponseVectors;
uniqueResponseVectors.reserve(responseVectors.size());

std::sort(responseVectors.begin(), responseVectors.end());
std::unique_copy(responseVectors.begin(), responseVectors.end(), std::back_inserter(uniqueResponseVectors));

std::vector<int> freqTotal;
freqTotal.reserve(uniqueResponseVectors.size());

for(auto&& vector : uniqueResponseVectors)
{
    int count = std::count(responseVectors.begin(), responseVectors.end(), vector);
    freqTotal.push_back(count);
}

And the output should be:

for(auto&& vector : uniqueResponseVectors)
{
    for(auto&& value : vector)
    {
        std::cout << "\t" << value << "\t";
    }

    std::cout << "\n";
}

// Printed result for the `uniqueResponseVectors`:
//
//    0               1               2
//    1               2               3
//    2               1               2
//    2               3               2
//    3               2               3
//    3               3               3

// Similarly for the `freqTotal`:
//
//    2
//    6
//    2
//    2
//    4
//    2

Additional context:

  • when I tried the code above against a larger dataset (i.e., responseVectors with a size of 100000 and 12 as the size of each std::vector<int>), it was slow.

  • I also tried to manipulate responseVectors directly by calling std::unique and erase() on it, in order avoid creating a copy. Then I used an iterator loop to std::count how many times a unique std::vector<int> occurs. However, this proved even slower.

Deduplicator
  • 44,692
  • 7
  • 66
  • 118
Mihai
  • 2,807
  • 4
  • 28
  • 53
  • 3
    if you use a `map` instead of vector you get what you want for free. Closely related: https://stackoverflow.com/questions/54476881/how-to-apply-the-concept-of-counting-occurrences-on-strings-variables-in-c (question is slightly different but the answer is a perfect match) – 463035818_is_not_an_ai Feb 01 '19 at 12:20
  • 1
    what makes your solution slow compared to using a map is that you are traversing the input several times, when you use a map you just insert the elements – 463035818_is_not_an_ai Feb 01 '19 at 12:26
  • @user463035818 Can you please elaborate in an answer? I appreciate the input, but not sure how to switch to using a `map`. – Mihai Feb 01 '19 at 12:30
  • 1
    did you look at the link above? I could not write a better answer than given there – 463035818_is_not_an_ai Feb 01 '19 at 12:39
  • Sorry, I completely missed the link. I am checking the answer (i.e., https://stackoverflow.com/a/54477263/5252007) right now! – Mihai Feb 01 '19 at 12:41
  • 1
    actually I would flag your question as a duplicate. I didnt do it, because then your question would get closed immediately. If you read the answer and it did solve your problem, I think you could also flag your own question as a duplicate of the other one – 463035818_is_not_an_ai Feb 01 '19 at 12:44
  • I am trying that implementation right now. If it solves my problem, I will write an answer below explaining why, provide a link to @YSC solution and flag it. Thanks! – Mihai Feb 01 '19 at 12:48
  • 1
    @Mihai It is good you're using STL, but your question has far more to do with using the proper data structure(s). A map, hash table, etc. should have been the choice, and given that, you have `std::map`, `std::unordered_map`, etc. – PaulMcKenzie Feb 01 '19 at 13:00
  • Wow! What I just learned is fantastic; made my day! Thanks for your suggestions @user463035818 and @PaulMcKenzie! – Mihai Feb 01 '19 at 13:31
  • If you want to have unique elements and keep the same spirit of the vector, you can use `std::set`. – doudouremi Feb 01 '19 at 14:04
  • Possible duplicate of [how to apply the concept of counting occurrences on strings variables in c++](https://stackoverflow.com/questions/54476881/how-to-apply-the-concept-of-counting-occurrences-on-strings-variables-in-c) – Mihai Feb 01 '19 at 14:25
  • always glad to help. `std::map` already made my day several times ;). Just beware that performance wise there are usage patterns where `std::vector` still beats `std::map` (just consider the different complexity of their methods to insert, sort, etc.) – 463035818_is_not_an_ai Feb 01 '19 at 14:45

1 Answers1

1

Based on this input, I tried to particularize the solution provided by YSC to my case, in order to better understand what is going on. It boils down to using std::map which is a sorted associative container:

std::map<std::vector<int>, int> SortAndCountOccurences(std::vector<std::vector<int>>& vectors)
{
    std::map<std::vector<int>, int> result;

    std::for_each(vectors.begin(), vectors.end(), [&result](auto const& vector) {
            ++result[vector]; 
    });

    return result;
}

With the following usage:

std::vector<std::vector<int>> responseVectors 
{
    { 2,  3,  2 },
    { 3,  2,  3 }, 
    { 3,  2,  3 }, 
    { 3,  3,  3 }, 
    { 1,  2,  3 }, 
    { 1,  2,  3 }, 
    { 1,  2,  3 }, 
    { 2,  1,  2 }, 
    { 0,  1,  2 }, 
    { 2,  3,  2 },
    { 3,  2,  3 }, 
    { 3,  2,  3 }, 
    { 3,  3,  3 }, 
    { 1,  2,  3 }, 
    { 1,  2,  3 }, 
    { 1,  2,  3 }, 
    { 2,  1,  2 }, 
    { 0,  1,  2 }
};

std::map<std::vector<int>, int> sortedVectors = SortAndCountOccurences(responseVectors);

Which will output:

for(auto&& vector : sortedVectors)
{
    for(auto&& value : vector.first)
    {
        std::cout << "\t" << value << "\t";
    }

    std::cout << "-> x" << vector.second << " times" << "\n";
}

//    0               1               2       -> x2 times
//    1               2               3       -> x6 times
//    2               1               2       -> x2 times
//    2               3               2       -> x2 times
//    3               2               3       -> x4 times
//    3               3               3       -> x2 times

Note: The solution by YSC can be generalized to anything iterable.

Mihai
  • 2,807
  • 4
  • 28
  • 53
  • Better use range-for instead of `std::for_each()`. – Deduplicator Feb 01 '19 at 15:42
  • Thanks for the tip! What would be the advantage of a range-for? – Mihai Feb 01 '19 at 16:32
  • 1
    Range-for is simpler than `std::for_each()` with lambda, it doesn't rely so much on the compiler to inline everything, and it allows you to use `break`, `return` and other normal flow-control. In addition, it's part of the core language instead of the standard library. – Deduplicator Feb 01 '19 at 16:58
  • `std::for_each` is great whet you have a range (couple of begin/end iterators). range-for is great whet you have a collection (container, array, ...). This is a style issue, nothing serious here. – YSC Feb 01 '19 at 16:58
  • @YSC When the Range TS is finally available, one can trivially adapt any iterator-pait for use with range-for, making `std::for_each()` pretty obsolete. – Deduplicator Feb 01 '19 at 18:20