The main problem here is that the std::sort
algorithm cannot operate on multiple vectors at the same time.
For the purpose of demonstration, let's assume you have a std::vector<int> v1
and a std::vector<char> v2
(of the same size of course) and you want to sort both depending on the values in v1
. To solve this, I basically see three possible solutions, all of which generalize to an arbitrary number of vectors:
1) Put all your data into a single vector.
Define a struct
, say Data
, that keeps an entry of every data vector.
struct Data
{
int d1;
char d2;
// extend here for more vectors
};
Now construct a new std::vector<Data>
and fill it from your original vectors:
std::vector<Data> d(v1.size());
for(std::size_t i = 0; i < d.size(); ++i)
{
d[i].d1 = v1[i];
d[i].d2 = v2[i];
// extend here for more vectors
}
Since everything is stored inside a single vector now, you can use std::sort
to bring it into order. Since we want it to be sorted based on the first entry (d1
), which stores the values of the first vector, we use a custom predicate:
std::sort(d.begin(), d.end(),
[](const Data& l, const Data& r) { return l.d1 < r.d1; });
Afterwards, all data is sorted in d
based on the first vector's values. You can now either work on with the combined vector d
or you split the data into the original vectors:
std::transform(d.begin(), d.end(), v1.begin(),
[](const Data& e) { return e.d1; });
std::transform(d.begin(), d.end(), v2.begin(),
[](const Data& e) { return e.d2; });
// extend here for more vectors
2) Use the first vector
to compute the indices of the sorted range and use these indices to bring all vector
s into order:
First, you attach to all elements in your first vector
their current position. Then you sort it using std::sort
and a predicate that only compares for the value (ignoring the position).
template<typename T>
std::vector<std::size_t> computeSortIndices(const std::vector<T>& v)
{
std::vector<std::pair<T, std::size_t>> d(v.size());
for(std::size_t i = 0; i < v.size(); ++i)
d[i] = std::make_pair(v[i], i);
std::sort(d.begin(), d.end(),
[](const std::pair<T, std::size_t>& l,
const std::pair<T, std::size_t>& r)
{
return l.first < r.first;
});
std::vector<std::size_t> indices(v.size());
std::transform(d.begin(), d.end(), indices.begin(),
[](const std::pair<T, std::size_t>& p) { return p.second; });
return indices;
}
Say in the resulting index vector
the entry at position 0
is 8
, then this tells you that the vector
entries that have to go to the first position in the sorted vector
s are those at position 8
in the original ranges.
You then use this information to sort all of your vectors
:
template<typename T>
void sortByIndices(std::vector<T>& v,
const std::vector<std::size_t>& indices)
{
assert(v.size() == indices.size());
std::vector<T> result(v.size());
for(std::size_t i = 0; i < indices.size(); ++i)
result[i] = v[indices[i]];
v = std::move(result);
}
Any number of vector
s may then be sorted like this:
const auto indices = computeSortIndices(v1);
sortByIndices(v1, indices);
sortByIndices(v2, indices);
// extend here for more vectors
This can be improved a bit by extracting the sorted v1
out of computeSortIndices
directly, so that you do not need to sort it again using sortByIndices
.
3) Implement your own sort function that is able to operate on multiple vector
s. I have sketched an implementation of an in-place merge sort that is able to sort any number of vector
s depending on the values in the first one.
The core of the merge sort algorithm is implemented by the multiMergeSortRec
function, which takes an arbitrary number (> 0) of vectors of arbitrary types.
The function splits all vectors into first and second half, sorts both halves recursively and merges the the results back together. Search the web for a full explanation of merge sort if you need more details.
template<typename T, typename... Ts>
void multiMergeSortRec(
std::size_t b, std::size_t e,
std::vector<T>& v, std::vector<Ts>&... vs)
{
const std::size_t dist = e - b;
if(dist <= 1)
return;
std::size_t m = b + (dist / static_cast<std::size_t>(2));
// split in half and recursively sort both parts
multiMergeSortRec(b, m, v, vs...);
multiMergeSortRec(m, e, v, vs...);
// merge both sorted parts
while(b < m)
{
if(v[b] <= v[m])
++b;
else
{
++m;
rotateAll(b, m, v, vs...);
if(m == e)
break;
}
}
}
template<typename T, typename... Ts>
void multiMergeSort(std::vector<T>& v, std::vector<Ts>&... vs)
{
// TODO: check that all vectors have same length
if(v.size() < 2)
return ;
multiMergeSortRec<T, Ts...>(0, v.size(), v, vs...);
}
In order to operate in-place, parts of the vector
s have to be rotated. This is done by the rotateAll
function, which again works on an arbitrary number of vector
s by recursively processing the variadic parameter pack.
void rotateAll(std::size_t, std::size_t)
{
}
template<typename T, typename... Ts>
void rotateAll(std::size_t b, std::size_t e,
std::vector<T>& v, std::vector<Ts>&... vs)
{
std::rotate(v.begin() + b, v.begin() + e - 1, v.begin() + e);
rotateAll(b, e, vs...);
}
Note, that the recursive calls of rotateAll
are very likely to be inlined by every optimizing compiler, such that the function merely applies std::rotate
to all vectors. You can circumvent the need to rotate parts of the vector, if you leave in-place and merge into an additional vector
. I like to emphasize that this is neither an optimized nor a fully tested implementation of merge sort. It should serve as a sketch, since you really do not want to use bubble sort whenever you work on large vectors.
Let's quickly compare the above alternatives:
- 1) is easier to implement, since it relies on an existing (highly optimized and tested)
std::sort
implementation.
- 1) needs all data to be copied into the new
vector
and possibly (depending on your use case) all of it to be copied back.
- In 1) multiple places have to be extended if you need to attach additional
vector
s to be sorted.
- The implementation effort for 2) is mediocre (more than 1, but less and easier than 3), but it relies on optimized and tested
std::sort
.
- 2) cannot sort in-place (using the indices) and thus has to make a copy of every
vector
. Maybe there is an in-place alternative, but I cannot think of one right now (at least an easy one).
- 2) is easy to extend for additional
vector
s.
- For 3) you need to implement sorting yourself, which makes it more difficult to get right.
- 3) does not need to copy all data. The implementation can be further optimized and can be tweaked for improved performance (out-of-place) or reduced memory consumption (in-place).
- 3) can work on additional
vector
s without any change. Just invoke multiMergeSort
with one or more additional arguments.
- All three work for heterogeneous sets of
vector
s, in contrast to the std::vector<std::vector<>>
approach.
Which of the alternatives performs better in your case, is hard to say and should greatly depend on the number of vector
s and their size, so if you really need optimal performance (and/or memory usage) you need to measure.
Find an implementation of the above here.