Sorting by blocks of elements with std::sort()

Question

I have an array of edges, which is defined as a C-style array of doubles, where every 4 doubles define an edge, like this:

double *p = ...;
printf("edge1: %lf %lf %lf %lf\n", p[0], p[1], p[2], p[3]);
printf("edge2: %lf %lf %lf %lf\n", p[4], p[5], p[6], p[7]);

So I want to use std::sort() to sort it by edge length. If it was a struct Edge { double x1, y1, x2, y2; }; Edge *p;, I would be good to go.

But in this case, the double array has a block size that is not expressed by the pointer type. qsort() allows you to explicitly specify the block size, but std::sort() infers the block-size by the pointer type.

For performance reasons (both memory-usage and CPU), let's say that it's undesirable to create new arrays, or transform the array somehow. For performance reasons again, let's say that we do want to use std::sort() instead of qsort().

Is it possible to call std::sort() without wasting a single CPU cycle on transforming the data?

Possible approach:

An obvious approach is to try to force-cast the pointer:

double *p = ...;
struct Edge { double arr[4]; };
Edge *p2 = reinterpret_cast<Edge*>(p);
std::sort(...);

But how do I make sure the data is aligned properly? Also, how do I make sure it will always be aligned properly on all platforms and architectures?

Or can I use a typedef double[4] Edge;?

emperically it is, presumably because it can inline the comparison. However, its worth first profiling your application to determine that the sorting is a bottleneck worth optimising - the qsort works well. — Will, Oct 30 '09 at 12:41
Having L global doesn't affect thread safety. In fact, as you can see, qsort receives that parameter, but it copies it inside. As an aside, std::sort is faster only in certain cases, but surely not on plain C arrays. — Diego Sevilla, Oct 30 '09 at 12:59
Added note about `qsort` because of a lot number of wrong answers. — Kirill V. Lyadvinsky, Oct 30 '09 at 13:08
Why can't you copy your C array into a vector, sort, and then copy back? the sort is nlogn anyway. the extra 2 ns won't make a noticable difference. — Brian Postow, Oct 30 '09 at 14:06
Note on your first update: doesn't MPI_Recv take an output buffer pointer as a parameter? If so, there's no reason you can't provide it a pointer into a vector rather than a raw array. Not that this helps with the actual problem in trying to do this with std::sort, which of course is not where they're stored, but the fact that the size of the elements being sorted isn't known until runtime, so std::sort doesn't know how to move them. — Steve Jessop, Oct 30 '09 at 15:49
@Alexandru I have the same problem, so I rewrote the question (hopefully it's better now), and added a bounty. — sashoalm, Dec 22 '14 at 17:18
@sashoalm did you consider using a comparator function in `sort()` function? — Ali Akber, Dec 22 '14 at 19:41
@sashoalm Threw out an answer. Reinterpreting to an array should not have alignment issues - and should handle both striding and swapping correctly. — Barry, Dec 22 '14 at 19:58

sellibitze · Answer 1 · 2009-10-30T13:49:22.230

You can use a "stride iterator" for this. A "stride iterator" wraps another iterator and an integer step size. Here's a simple sketch:

template<typename Iter>
class stride_iterator
{
    ...

    stride_iterator(Iter it, difference_type step = difference_type(1))
    : it_(it), step_(step) {}

    stride_iterator& operator++() {
        std::advance(it_,step_);
        return *this;
    }

    Iter base() const { return it_; }

    difference_type step() const { return step_; }

    ...

private:
    Iter it_;
    difference_type step_;
};

Also, helper functions like these

template<typename Iter>
stride_iterator<Iter> make_stride_iter(
    Iter it,
    typename iterator_traits<Iter>::difference_type step)
{
    return stride_iterator<Iter>(it,step);
}

template<typename Iter>
stride_iterator<Iter> make_stride_iter(
    stride_iterator<Iter> it,
    typename iterator_traits<Iter>::difference_type step)
{
    return stride_iterator<Iter>(it.base(),it.step() * step);
}

should make it fairly easy to use stride iterators:

int array[N*L];
std::sort( make_stride_iter(array,L),
           make_stride_iter(array,L)+N );

Implementing the iterator adapter all by yourself (with all operators) is probably not a good idea. As Matthieu pointed out, you can safe yourself a lot of typing if you make use of Boost's iterator adapter tools, for example.

Edit: I just realized that this doesn't do what you wanted since std::sort will only exchange the first element of each block. I don't think there's an easy and portable solution for this. The problem I see is that swapping "elements" (your blocks) cannot be (easily) customized when using std::sort. You could possibly write your iterator to return a special reference type with a special swap function but I'm not sure whether the C++ standard guarantees that std::sort will use a swap function that is looked up via ADL. Your implementation may restrict it to std::swap.

I guess the best answer is still: "Just use qsort".

Isn't && a C++0x thingy? Seems like a good solution, I'll check it out. — Alexandru, Oct 30 '09 at 13:02
"Isn't && a C++0x thingy?" - Right, my mistake. I changed it. — sellibitze, Oct 30 '09 at 13:05
Ouch, that's code bloat. `boost::iterator_adaptor`: http://www.boost.org/doc/libs/1_40_0/libs/iterator/doc/iterator_adaptor.html — Matthieu M., Oct 30 '09 at 13:09
Yeah, using some iterator adapter library seems like a good idea. For that matter, I think that stride iterators should be part of std:: or at least boost:: as well. ;) — sellibitze, Oct 30 '09 at 13:23

score 2 · Accepted Answer · edited May 23 '17 at 11:50

2

How about having a reordering vector? You initialize vector with 1..N/L, pass std::sort a comparator that compares elements i1*L..i1*L+L to i2*L..i2*L+L, and when your vector is properly sorted, reorder the C array according to new order.

In response to comment: yes things get complicated, but it may just be good complication! Take a look here.

edited May 23 '17 at 11:50

Community

1
1

answered Oct 30 '09 at 14:00

Things get complicated if you don't use an additional array (which I won't). – Alexandru Oct 30 '09 at 14:56

Barry · Answer 3 · 2014-12-22T19:21:56.723

For the new question, we need to pass in sort() a kind of iterator that will not only let us compare the right things (i.e. will make sure to take 4 steps through our double[] each time instead of 1) but also swap the right things (i.e. swap 4 doubles instead of one).

We can accomplish both by simply reinterpreting our double array as if it were an array of 4 doubles. Doing this:

typedef double Edge[4];

doesn't work, since you can't assign an array, and swap will need to. But doing this:

typedef std::array<double, 4> Edge;

or, if not C++11:

struct Edge {
    double vals[4];
};

satisfies both requirements. Thus:

void sort(double* begin, double* end) {
    typedef std::array<double, 4> Edge;

    Edge* edge_begin = reinterpret_cast<Edge*>(begin);
    Edge* edge_end = reinterpret_cast<Edge*>(end);

    std::sort(edge_begin, edge_end, compare_edges);
}

bool compare_edges(const Edge& lhs, const Edge& rhs) {
    // to be implemented
}

If you're concerned about alignment, can always just assert that there's no extra padding:

static_assert(sizeof(Edge) == 4 * sizeof(double), "uh oh");

Thanks, I just saw the answer. Can we be sure that such a struct is always aligned properly? Is it defined by the standard, or is it left up to the implementation? Also, `static_assert` requires C++11 as well, doesn't it? — sashoalm, Dec 22 '14 at 22:02
@sashoalm I believe so, though I can't give you a quote. `Edge` should have the same alignment as `double` - so there shouldn't be any padding. If it's not, I'd really like to know why not. And you can use `BOOST_STATIC_ASSERT()` instead. — Barry, Dec 22 '14 at 22:44
'#pragma pack' is supported by VC++ and GCC - I wouldn't use it, though, if possible — dwn, Dec 23 '14 at 22:21

score 1 · Answer 4 · answered Oct 30 '09 at 13:59

I don't remember exactly how to do this, but if you can fake anonymous functions, then you can make a comp(L) function that returns the version of comp for arrays of length L... that way L becomes a parameter, not a global, and you can use qsort. As others mentioned, except in the case where your array is already sorted, or backwards or something, qsort is going to be pretty much just as fast as any other algorithm. (there's a reason it's called quicksort after all...)

score 1 · Answer 5 · answered Oct 31 '09 at 15:39

It's not part of any ANSI, ISO, or POSIX standard, but some systems provide the qsort_r() function, which allows you to pass an extra context parameter to the comparison function. You can then do something like this:

int comp(void *thunk, const void *a, const void *b)
{
    int L = (int)thunk;
    // compare a and b as you would normally with a qsort comparison function
}

qsort_r(array, N, sizeof(int) * L, (void *)L, comp);

Alternatively, if you don't have qsort_r, you can use the callback(3) package from the ffcall library to create closures at runtime. Example:

#include <callback.h>
void comp_base(void *data, va_alist alist)
{
    va_start_int(alist);  // return type will be int

    int L = (int)data;
    const void *a = va_arg_ptr(alist, const void*);
    const void *b = va_arg_ptr(alist, const void*);

    // Now that we know L, compare
    int return_value = comp(a, b, L);

    va_return_int(alist, return_value);  // return return_value
}

...    

// In a function somewhere
typedef int (*compare_func)(const void*, const void*);

// Create some closures with different L values
compare_func comp1 = (compare_func)alloc_callback(&comp_base, (void *)L1);
compare_func comp2 = (compare_func)alloc_callback(&comp_base, (void *)L2);
...
// Use comp1 & comp2, e.g. as parameters to qsort
...
free_callback(comp1);
free_callback(comp2);

Note that the callback library is threadsafe, since all parameters are passed on the stack or in registers. The library takes care of allocating memory, making sure that memory is executable, and flushing the instruction cache if necessary to allow dynamically generated code (that is, the closure) to be executed at runtime. It supposedly works on a large variety of systems, but it's also quite possible that it won't work on yours, either due to bugs or lack of implementation.

Also note that this adds a little bit of overhead to the function call. Each call to comp_base() above has to unpack its arguments from the list passed it (which is in a highly platform-dependent format) and stuff its return value back in. Most of the time, this overhead is miniscule, but for a comparison function where the actual work performed is very small and which will get called many, many times during a call to qsort(), the overhead is very significant.

score 0 · Answer 6 · answered Oct 30 '09 at 12:31

0

std::array< std::array<int, L>, N > array;
// or std::vector< std::vector<int> > if N*L is not a constant
std::sort( array.begin(), array.end() );

answered Oct 30 '09 at 12:31

Kirill V. Lyadvinsky

97,037
24
136
212

4

Then use `qsort`, using std::sort is not a faster way. – Kirill V. Lyadvinsky Oct 30 '09 at 12:37

Will · Answer 7 · 2009-10-30T13:26:38.580

namespace
{
    struct NewCompare
    {
        bool operator()( const int a, const int b ) const
        {
            return a < b;
        }

    };
}

std::sort(array+start,array+start+L,NewCompare);

Do test with std::stable_sort() on realistic data-sets - for some data mixes its substantially faster!

On many compilers (GCC iirc) there's a nasty bite: the std::sort() template asserts that the comparator is correct by testing it TWICE, once reversed, to ensure the result is reversed! This will absolutely completely kill performance for moderate datasets in normal builds. The solution is something like this:

#ifdef NDEBUG
  #define WAS_NDEBUG
  #undef NDEBUG
#endif
#define NDEBUG
#include <algorithm>
#ifdef WAS_NDEBUG
  #undef WAS_NDEBUG
#else
  #undef NDEBUG
#endif

Adapted from this excellent blog entry: http://www.tilander.org/aurora/2007/12/comparing-stdsort-and-qsort.html

i don't get it. how does this handle the block nature of array? — Matt Joiner, Nov 02 '09 at 00:09

score 0 · Answer 8 · edited May 23 '17 at 11:50

I'm not sure if you can achieve the same result without a lot more work. std::sort() is made to sort sequences of elements defined by two random access iterators. Unfortunately, it determines the type of the element from the iterator. For example:

std::sort(&array[0], &array[N + L]);

will sort all of the elements of array. The problem is that it assumes that the subscripting, increment, decrement, and other indexing operators of the iterator step over elements of the sequence. I believe that the only way that you can sort slices of the array (I think that this is what you are after), is to write an iterator that indexes based on L. This is what sellibitze has done in the stride_iterator answer.

score 0 · Answer 9 · answered Oct 31 '09 at 20:26

Arkadiy has the right idea. You can sort in place if you create an array of pointers and sort that:

#define NN 7
#define LL 4

int array[NN*LL] = {
    3, 5, 5, 5,
    3, 6, 6, 6,
    4, 4, 4, 4,
    4, 3, 3, 3,
    2, 2, 2, 2,
    2, 0, 0, 0,
    1, 1, 1, 1
};

struct IntPtrArrayComp {
    int length;
    IntPtrArrayComp(int len) : length(len) {}
    bool operator()(int* const & a, int* const & b) {
        for (int i = 0; i < length; ++i) {
            if (a[i] < b[i]) return true;
            else if (a[i] > b[i]) return false;
        }
        return false;
    }
};

void sortArrayInPlace(int* array, int number, int length)
{
    int** ptrs = new int*[number];
    int** span = ptrs;
    for (int* a = array; a < array+number*length; a+=length) {
        *span++ = a;
    }
    std::sort(ptrs, ptrs+number, IntPtrArrayComp(length));
    int* buf = new int[number];
    for (int n = 0; n < number; ++n) {
        int offset = (ptrs[n] - array)/length;
        if (offset == n) continue;

        // swap
        int* a_n = array+n*length;
        std::move(a_n, a_n+length, buf);
        std::move(ptrs[n], ptrs[n]+length, a_n);
        std::move(buf, buf+length, ptrs[n]);

        // find what is pointing to a_n and point it 
        // to where the data was move to
        int find = 0;
        for (int i = n+1; i < number; ++i) {
            if (ptrs[i] == a_n) {
                find = i;
                break;
            }
        }
        ptrs[find] = ptrs[n];
    }
    delete[] buf;
    delete[] ptrs;
}

int main()
{
    for (int n = 0; n< NN; ++n) {
        for (int l = 0; l < LL; ++l) {
            std::cout << array[n*LL+l];
        }
        std::cout << std::endl;
    }
    std::cout << "----" << std::endl;
    sortArrayInPlace(array, NN, LL);
    for (int n = 0; n< NN; ++n) {
        for (int l = 0; l < LL; ++l) {
            std::cout << array[n*LL+l];
        }
        std::cout << std::endl;
    }
    return 0;
}

Output:

replace int* buf = new int[number]; with int* buf = new int[length]; Arkady's answer was correct, but I wanted a solution without a second array. I'm going to accept his answer as being the best solution 'couse he came up with the original idea. — Alexandru, Oct 31 '09 at 22:49

score 0 · Answer 10 · answered Nov 02 '09 at 04:24

0

A lot of these answers seem like overkill. If you really have to do it C++ style, using jmucchiello's example:

template <int Length>
struct Block
{
    int n_[Length];

    bool operator <(Block const &rhs) const
    {
        for (int i(0); i < Length; ++i)
        {
            if (n_[i] < rhs.n_[i])
                return true;
            else if (n_[i] > rhs.n_[i])
                return false;
        }
        return false;
    }
};

and then sort with:

sort((Block<4> *)&array[0], (Block<4> *)&array[NN]);

It doesn't have to be any more complicated.

answered Nov 02 '09 at 04:24

Matt Joiner

112,946
110
377
526

This requires the length to be known at compile time, though. – CB Bailey Nov 02 '09 at 05:46
not hard to modify otherwise. from the example i based it on, it is. – Matt Joiner Nov 02 '09 at 07:51

Sorting by blocks of elements with std::sort()

Possible approach:

10 Answers10

Linked