Sorting multiple vectors in single sort call C++

Question

This answer demonstrates how to efficiently obtain an indices vector using std::sort on a vector of values using the nice new-ish C++11 functionality (there's also a variety of duplicates of that question as well). It also hints that you can obtain the double output of both the sorted vector and the sorted indices by "using an extra vector." However, the only way I can achieve this is by calling std:sort a second time. I'm working with arrays with lengths of tens, maybe hundreds, of thousands of elements trying to focus on efficiency. Is it possible to obtain both the sorted vector and the indices of the sort from a single call to std::sort?

More generally, my question is: can one sort multiple vectors with a single sort call? The assumption is the sorting order is based on only one of the supplied vectors.

What I've come up with in the meantime is below (a slight modification to the code in the linked answer). As you can see, it requires a call to std::sort for each vector being sorted, even though they are all to be ordered according to the sorting of a single vector. I suspect there may be a way to do this by passing references to the lambda compare function, but I can't seem to make it work.

#include <numeric>
#include <algorithm>

using std;

void sort_vectors(vector<size_t> idx, vector<double> &v) {

  // sort indexes based on comparing values in v
  sort(idx.begin(), idx.end(),
       [&v](size_t i1, size_t i2) {return v[i1] < v[i2];});

  // Sort the actual vector
  sort(v.begin(), v.end());

  return idx;
}

Think algorithmically: if you have the indexes of the sorted items, you can create a second vector, and move all the items to the proper positions. This will be a trivial task (O(n)), and cheap relative to the (O(n*logn)) sort. — Alex Huszagh, Jun 30 '17 at 21:33
@AlexanderHuszagh: Ah that makes sense I guess. Speaking in terms of actual operations (rather than asymptotes), if we rearranged each vector in tandem with the sort we'd use m\*n\*logn operations (m vectors, each of length n). Just rearranging based on indices would require (m-1)\*n + n\*logn = mn - n + n\*logn which is fewer... Hell, even asymptotically, we're looking at O(mn + nlogn) < O(mnlogn) — marcman, Jun 30 '17 at 21:40

Alex Huszagh · Accepted Answer · 2017-06-30T21:45:27.867

std::sort takes iterators: Although a custom sort could likely take both indexes and the values in a single sort step, it's unlikely to be of much use (and may require different algorithms, making it slower).

Algorithm Design

Why? Because std::sort performs in O(n*logn) time. Moving elements from the sorted indexes will take O(n) time, which is relatively cheap in comparison.

Using the example from above, in the link given, we have this existing code:

using namespace std;

template <typename T>
vector<size_t> sort_indexes(const vector<T> &v) 
{

  // initialize original index locations
  vector<size_t> idx(v.size());
  iota(idx.begin(), idx.end(), 0);

  // sort indexes based on comparing values in v
  sort(idx.begin(), idx.end(),
       [&v](size_t i1, size_t i2) {return v[i1] < v[i2];});

  return idx;
}

We can now create a sorted array from these indexes, a cheap step:

template <typename T>
vector<T> sorted_array(const vector<T> &v, const vector<size_t>& i) 
{
    vector<T> out;
    out.reserve(v.size())
    for (auto j: i) {
        out.emplace_back(v[j]);
    }
}

If copying the values is too prohibitive, you can use a std::reference_wrapper to create a non-nullable wrapper.

template <typename T>
vector<reference_wrapper<const T>> sorted_array(const vector<T> &v, const vector<size_t>& i) 
{
    vector<reference_wrapper<const T>> out;
    out.reserve(v.size())
    for (auto j: i) {
        out.emplace_back(std::cref(v[j]));
    }
}

Even for large arrays, this should be pretty efficient.

Caution

Don't try to sort two arrays at once. Don't try to move items in your value array when sorting the index array. Why? Because the comparison is index-based for the value array: moving items will destroy the sort in the original array. Since moving the items to the correct position is so cheap once you have the sorted indexes, don't worry about performance here: the sort is the bottleneck.

How might this be done for an in-place sort for the additional arrays, as with `std::sort`? — marcman, Jun 30 '17 at 21:45
@marcman, in the example above, I'm getting the indexes, and then creating a copy that has the sorted data. The total time complexity is still `O(n*logn)`, so it's just as efficient as the original sort. If you want an in-place sort, just do an in-place sort. If you want the indexes of the sorted array, you can create a copy or use a facade to iterate over the elements in sorted order while still preserving the original data. — Alex Huszagh, Jun 30 '17 at 21:47
Makes sense. I'm not familiar with a facade, but I get the idea. Thanks! — marcman, Jun 30 '17 at 21:49
@marcman, basically, you could create a class wrapper that looks like a vector, but binds data and the sorta indexes as private members, and implements iteration over the indexes and dereferences to the value of the vector at that index. A lot of boilerplate for very little payoff, IMO, but I've done it before. And of course. Always think algorithmically: it saves you a huge amount of time down the road. Here, you just don't have to worry about performance since the other step is so much more prohibitive. — Alex Huszagh, Jun 30 '17 at 21:51

score 0 · Answer 2 · answered Jun 30 '17 at 22:25

The reorder of the array(s) or vector(s) according to sorted indices can be done in place in O(n) time. This example sorts two arrays using a third array of indices. During the reorder, the array of indices is restored back to it's original state of going from 0 to n-1. I manually did the iota part in this example, and it doesn't use templates, but could be easily converted to templates and vectors:

#include <algorithm>
#include <iostream>

int main()
{
int A[8] = {8,6,1,7,5,3,4,2};
char B[8] = {'h','f','a','g','e','c','d','b'};
size_t I[8];
size_t i, j, k;
int ta;
char tb;
    // create array of indices to A[]
    for(i = 0; i < sizeof(A)/sizeof(A[0]); i++)
        I[i] = i;
    // sort array of indices according to A[]
    std::sort(I, I+sizeof(I)/sizeof(I[0]),
              [&A](int i, int j) {return A[i] < A[j];});
    // reorder A[] B[] I[] according to I[]
    for(i = 0; i < sizeof(A)/sizeof(A[0]); i++){
        if(i != I[i]){
            ta = A[i];
            tb = B[i];
            k = i;
            while(i != (j = I[k])){
                A[k] = A[j];
                B[k] = B[j];
                I[k] = k;
                k = j;
            }
            A[k] = ta;
            B[k] = tb;
            I[k] = k;
        }
    }
    for(i = 0; i < sizeof(A)/sizeof(A[0]); i++)
        std::cout << A[i] << ' ';
    std::cout << std::endl;
    for(i = 0; i < sizeof(B)/sizeof(B[0]); i++)
        std::cout << B[i] << ' ';
    std::cout << std::endl;
    return 0;
}

or an array of pointers can be used instead of an array of indices, which allows a normal compare function instead of a lambda compare function.

#include <algorithm>
#include <iostream>

bool compare(const int *p0, const int *p1)
{
    return *p0 < *p1;
}

int main()
{
int A[8] = {8,6,1,7,5,3,4,2};
char B[8] = {'h','f','a','g','e','c','d','b'};
int *pA[8];
size_t i, j, k;
int ta;
char tb;
    // create array of pointers to A[]
    for(i = 0; i < sizeof(A)/sizeof(A[0]); i++)
        pA[i] = &A[i];
    // sort array of pointers according to A[]
    std::sort(pA, pA+sizeof(A)/sizeof(A[0]), compare);
    // reorder A[] B[] pA[] according to pA[]
    for(i = 0; i < sizeof(A)/sizeof(A[0]); i++){
        if(i != pA[i]-A){
            ta = A[i];
            tb = B[i];
            k = i;
            while(i != (j = pA[k]-A)){
                A[k] = A[j];
                B[k] = B[j];
                pA[k] = &A[k];
                k = j;
            }
            A[k] = ta;
            B[k] = tb;
            pA[k] = &A[k];
        }
    }
    for(i = 0; i < sizeof(A)/sizeof(A[0]); i++)
        std::cout << A[i] << ' ';
    std::cout << std::endl;
    for(i = 0; i < sizeof(B)/sizeof(B[0]); i++)
        std::cout << B[i] << ' ';
    std::cout << std::endl;
    return 0;
}

Caleth · Answer 3 · 2021-07-20T08:45:24.273

Rather than constructing an index vector, and applying it to each of the vectors you want to sort, you can instead make a range that references all the vectors to sort, and sort that by the appropriate element.

template <typename OrderBy, typename... Others>
void sort_multiple(OrderBy& order_by, Others&... others) {
    auto range = ranges::views::zip(order_by, others...);
    ranges::actions::sort(range.begin(), range.end(), std::less{}, [](auto & tuple){ return get<0>(tuple); });
}

The only unfortunate thing about this formulation is that there isn't a simple way of specifying a custom compare, because ... arguments are greedy.

struct custom_compare_t {} custom_compare;

template <typename Compare, typename OrderBy, typename... Others>
void sort_multiple(custom_compare_t, Compare compare, OrderBy& order_by, Others&... others) {
    auto range = ranges::views::zip(order_by, others...);
    ranges::actions::sort(range.begin(), range.end(), compare, [](auto & tuple){ return get<0>(tuple); });
}

Sorting multiple vectors in single sort call C++

3 Answers3

Linked