Simultaneous in-place std::sort on a vector of keys and a vector of values

Question

I have a vector<uint64_t> keys and a vector<char> vals, both of size N. I would like to sort keys and vals based on entries in keys.

An obvious solution is copying into a vector<pair<uint64_t, char>>, sorting that, and copying the sorted data back out, but I would like to avoid copying, and I would like to avoid the alignment padding: sizeof(pair<uint64_t, char>) is 2*sizeof(uint64_t), or 16 bytes, due to alignment; much more than the 9 bytes needed.

In other words, although the following C++11 implementation is correct, it is not efficient enough:

#include <algorithm>
#include <tuple>
using namespace std;
void aux_sort(vector<uint64_t> & k, vector<char> & v) {
    vector<pair<uint64_t, char> > kv(k.size());
    for (size_t i = 0; i < k.size(); ++i) kv[i] = make_pair(k[i], v[i]);
    sort(kv.begin(), kv.end());
    for (size_t i = 0; i < k.size(); ++i) tie(k[i], v[i]) = kv[i];
}

Although the following C++11 implementation is correct, I want to use std::sort instead of hand-coding my own sorting algorithm:

#include <algorithm>
using namespace std;
void aux_sort(vector<uint64_t> & k, vector<char> & v) {
    for (size_t i = 0; i < k.size(); ++i)
        for (size_t j = i; j--;)
            if (k[j] > k[j + 1]) {
                iter_swap(&k[j], &k[j + 1]);
                iter_swap(&v[j], &v[j + 1]);
            }
}

(Edit to add, in response to @kfsone) Although the following implementation is correct, it is not in-place, since permutation according to indices needs a copy (or alternatively, a prohibitively complex linear time in-place permutation algorithm that I am not going to implement):

#include <algorithm>
#include <tuple>
using namespace std;
void aux_sort(vector<uint64_t> & k, vector<char> & v) {
    vector<size_t> indices(k.size());
    iota(indices.begin(), indices.end(), 0);
    sort(indices.begin(), indices.end(),
        [&](size_t a, size_t b) { return k[a] < k[b]; });
    vector<uint64_t> k2 = k;
    vector<char> v2 = v;
    for (size_t i = 0; i < k.size(); ++i)
        tie(k[i], v[i]) = make_pair(k2[indices[i]], v2[indices[i]]);
}

What is the easiest way to apply STL algorithms such as std::sort to a sequence of key/value-pairs in-place, with keys and values stored in separate vectors?

Background: My application is reading large (40 000 by 40 000) rasters that represent terrains, one row at a time. One raster assigns each cell a label between 0 and 10 000 000 such that labels are contiguous, and another raster assigns each cell a value between 0 and 255. I want to sum the values for each label in an efficient manner, and I think the fastest way is to sort the label row, and for each swap during the sort, apply the same swap in the value row. I want to avoid coding std::sort, std::set_intersection and others by hand.

It's hard if not impossible to go below 16 extra bytes per element on a 64 bit machine. — Shoe, Jun 07 '15 at 10:26
You could always create a vector of indexes to the `vector k`, sort the indexes and then use that as a jump table to `vector v` values. — kfsone, Jun 07 '15 at 10:47
@kfsone: Yes, that is a third option I have considered, but in order to permute `k` and `v` into sorted order, I then need to copy them into new vectors `k2` and `v2`, which is not in-place. — Mathias Rav, Jun 07 '15 at 10:58
What is the purpose of all your restrictions on the solution? Are you trying to save time? Memory? — Jørgen Fogh, Jun 07 '15 at 11:24
I think you could create a "facade" object containing references to `keys` and `vals` (or even better, direct links to the underlining memory locations) , implement iterators, comparison and swap, and then apply `sort` on it. — Karoly Horvath, Jun 07 '15 at 11:29
@JørgenFogh: I feel like it ought to be possible with the STL. In my application, I will pick the solution that is fastest, but I would expect that the fastest solution uses the STL in an in-place solution. I just cannot figure out how to get the STL to do what I want. — Mathias Rav, Jun 07 '15 at 11:32
@KarolyHorvath That's exactly what I would do. I think you should post that as an answer. — , Jun 07 '15 at 11:42
Related/duplicate: [Sorting zipped (locked) containers in C++ using boost or the STL](http://stackoverflow.com/q/13840998/) — dyp, Jun 07 '15 at 11:54
@hvd: ATM I'm just trying to put it together as an excersie (lovely error messages..), but my C++ knowledge is not that solid... that code in the duplicate is sooo much more concise. Feel free to post an answer ;) — Karoly Horvath, Jun 07 '15 at 12:01
@AviGinsburg: that's how it's called in python. But until you mentioned it I haven't thought about it that way. thx. — Karoly Horvath, Jun 07 '15 at 12:03
@KarolyHorvath I would, but Yakk already posted something even slightly better, so I just upvoted that instead. :) — , Jun 07 '15 at 18:22

score 4 · Answer 1 · edited May 23 '17 at 12:10

4

Range adapters. The most direct route would be a zip range, that takes two equal length ranges over T and U respectively, and produces a range over pair<T&,U&>. (containers are a kind of range -- a range that owns its contents)

You then sort this by .first (or use default sort, where .second determines ties).

The range is never a container, the wrapping into pairs happens on the fly with each dereference of the zip iterator.

boost has a zip iterators and zip ranges, but you can write them yourself. The boost iterators/ranges may be read only, but the link also contains an implementation of zipping that is not, and maybe boost has upgraded.

edited May 23 '17 at 12:10

Community

1
1

answered Jun 07 '15 at 13:35

Yakk - Adam Nevraumont

262,606
27
330
524

2

Related: [Unzip in C++ Range-v3 library](http://stackoverflow.com/a/30419985/923854). – Casey Jun 07 '15 at 17:06
2

@kfsone The presence or absence of code is not some magical property that indicates all by itself whether an answer is useful or useless. The approach in this answer is a good one, and the answer has enough details that you can write working code based on this answer. – Jun 07 '15 at 18:20

score 1 · Answer 2 · answered Jun 07 '15 at 11:33

1

You can use the thrust library and use the sort by key function. Not STL, but has the (dubious) advantage of being easily ported to a nVIdia GPU.

answered Jun 07 '15 at 11:33

Avi Ginsburg

10,323
3
29
56

Mathias Rav · Answer 3 · 2016-08-03T07:43:41.003

In fact it is easy to permute the input vectors according to indices in-place (contrary to the claim in the question):

#include <algorithm>
#include <tuple>
using namespace std;
void aux_sort(vector<uint64_t> & k, vector<char> & v) {
    vector<size_t> indices(k.size());
    iota(indices.begin(), indices.end(), 0);
    sort(indices.begin(), indices.end(),
        [&](size_t a, size_t b) { return k[a] < k[b]; });
    for (size_t i = 0; i < k.size(); ++i)
        while (indices[i] != i) {
            swap(k[i], k[indices[i]]);
            swap(v[i], v[indices[i]]);
            swap(indices[i], indices[indices[i]]);
        }
}

However, this solution is perhaps undesirable since it incurs many more cache-faults than the sorting itself as the input is traversed in the order of indices, which can possibly incur one cache fault per element. On the other hand, quicksort incurs much fewer cache faults (O(n/B log n/M) when pivots are random, where B is the size of a cache line and M is the size of the cache).

score 0 · Answer 4 · answered Jun 07 '15 at 11:38

I do not believe that it is possible to satisfy all the constraints that you have set up for the solution. It is almost certainly possible to hack the STL to sort the arrays. However, the solution is likely to be both clumsy and slower than just copying the data, sorting it, and copying it back.

If you have the option, you might want to consider just storing the data in a single vector to begin with.

Simultaneous in-place std::sort on a vector of keys and a vector of values

4 Answers4