Given a std::vector<std::vector<int>>
:
- I want to output a sorted
std::vector<std::vector<int>>
- that contains only unique
std::vector<int>
- as well as the frequency (i.e., count) of these unique
std::vector<int>
My question is twofold:
- how can I efficiently achieve this relying only on the standard library?
- what is causing the code below to perform poorly?
I tried the following:
std::vector<std::vector<int>> responseVectors
{
{ 2, 3, 2 },
{ 3, 2, 3 },
{ 3, 2, 3 },
{ 3, 3, 3 },
{ 1, 2, 3 },
{ 1, 2, 3 },
{ 1, 2, 3 },
{ 2, 1, 2 },
{ 0, 1, 2 },
{ 2, 3, 2 },
{ 3, 2, 3 },
{ 3, 2, 3 },
{ 3, 3, 3 },
{ 1, 2, 3 },
{ 1, 2, 3 },
{ 1, 2, 3 },
{ 2, 1, 2 },
{ 0, 1, 2 }
};
std::vector<std::vector<int>> uniqueResponseVectors;
uniqueResponseVectors.reserve(responseVectors.size());
std::sort(responseVectors.begin(), responseVectors.end());
std::unique_copy(responseVectors.begin(), responseVectors.end(), std::back_inserter(uniqueResponseVectors));
std::vector<int> freqTotal;
freqTotal.reserve(uniqueResponseVectors.size());
for(auto&& vector : uniqueResponseVectors)
{
int count = std::count(responseVectors.begin(), responseVectors.end(), vector);
freqTotal.push_back(count);
}
And the output should be:
for(auto&& vector : uniqueResponseVectors)
{
for(auto&& value : vector)
{
std::cout << "\t" << value << "\t";
}
std::cout << "\n";
}
// Printed result for the `uniqueResponseVectors`:
//
// 0 1 2
// 1 2 3
// 2 1 2
// 2 3 2
// 3 2 3
// 3 3 3
// Similarly for the `freqTotal`:
//
// 2
// 6
// 2
// 2
// 4
// 2
Additional context:
when I tried the code above against a larger dataset (i.e.,
responseVectors
with a size of 100000 and 12 as the size of eachstd::vector<int>
), it was slow.I also tried to manipulate
responseVectors
directly by callingstd::unique
anderase()
on it, in order avoid creating a copy. Then I used an iterator loop tostd::count
how many times a uniquestd::vector<int>
occurs. However, this proved even slower.