I have two random access iterables: one containing the values and the other containing a permutation index. I want to sort the values and track the changes in the permutation index.
Sample input:
std::array values{7, 45, 18, 33, 77, 96, 83, 80, 4, 51};
std::array<int, values.size()> index;
Output:
values == {4, 7, 18, 33, 45, 51, 77, 80, 83, 96};
index == {8, 0, 2, 3, 1, 9, 4, 7, 6, 5};
The simple solution
One solution is to construct a vector
of pair
s containing the value and its index,
sort this vector by the values and then extract the solution from this.
But the original vector may be large so I would like to avoid converting it twice.
I will nickname this solution vector pair sort
.
Another solution
I can just std::sort
the permutation index (which would contain 0, 1, 2... in the
beginning) using values in the values iterable. This is described at How to obtain the index permutation after the sorting
This would give me the permutation index but it won't sort the value iterable.
I could do two things from here:
std::sort()
the values.I would have to sort the data twice. I would like to avoid that. The comparison operator might not be cheap. This solution will be
double sort
.Sort the values using the permutation index.
I am not exactly sure how to do that. One solution is described here. This solution is
boost index apply sort
(because it is implemented byboost::algorithm::apply_permutation()
). This modifies the permutation index so I have to make a copy before passing it toboost::algorithm::apply_permutation()
.A simple solution that can do it in place is described here. This is
permutate in place sort
.Another solution would be to convert the index into cycles and then apply it. I haven't implemented this.
The "best" solution
I think that the most logical approach is to sort the values normally, but when the sorting algorithm swaps two values, it should also swap corresponding indexed in the permutation index.
This would sort both iterables in place at once without any added cost.
But I have no idea how to implement this. If I would be implementing the sorting algorithm,
I could just add a extra swap()
and the issue would be solved. But I'd like to reuse
std::sort()
. It gives me a reasonably fast sorting algorithm without the need
of writing it. I could also replace it with a different algorithm in the future by just
replacing std::sort()
with custom_sort()
(assuming that custom_sort()
has same
signature which is a reasonable assumption).
I am interested in swap()
. std::sort()
is using this function to swap elements.
My goal is to override this function and to make it swap the actual values and the
indexes.
One way of doing that is creating a custom iterator that would combine iterators
of values and indexes. This iterator would return a reference to a wrapper value upon
calling operator*
or operator[]
. This wrapper value would have a custom swap()
.
Benchmarks
By the way I have done some benchmarks. You can find them and the implementations at
https://github.com/meator/indexsort_impl I tried to sort a vector
containing 1 000 000
random int
s and double
s. The int
s are ranging from INT_MIN
to INT_MAX
and the double
s are ranging from 0 to 1. I ran 200 samples. Here are the results:
function | int benchmark |
double benchmark |
---|---|---|
value sort* | 114.153 ms | 127.824 ms |
index sort using values* | 177.985 ms | 206.153 ms |
vector pair sort | 122.795 ms | 139.467 ms |
double sort | 291.954 ms | 339.121 ms |
boost index apply sort | 263.827 ms | 298.844 ms |
permutate in place sort | 679.917 ms | 708.676 ms |
The "simple" vector pair sort
is surprisingly good. Maybe the compiler is smart.
Question
Is my "best" solution with iterators viable? How could I implement that? Are there some more solutions that I have dismissed? Which is the best?
EDIT: My benchmarks contained a massive flaw which invalidated all results. I have fixed that. I also added raw sort benchmarks as per Paul Sanders'es suggestion. These of course aren't indexsort.
* As mentioned above, these aren't actual indexsort.