2

Is it possible to write a compile time recursive sort function in c++ using the selection algorithm?

I want to sort array x from element istart to element iend in ascending order. Array x has N elements. The data in the input array x is known only at runtime, hence the data can only be sorted at runtime. However, I want to generate the C++ code, i.e. all recursive function calls to sort_asc() at compile time. Furthermore, I want to use this code in a CUDA device function. Since CUDA is a subset of C++ with a handful of extensions, I thought this was the right place to ask. Unfortunately, I don't think CUDA supports the constexpr keyword, neither Boost nor STL.

I came up with the following code for sorting in ascending order.

// sort in ascending order.
template< int istart, int N, int iend, int iend_min_istart >
inline void sort_asc
(
    float *x
)
{
    int   min_idx = istart;
    float min_val = x[ min_idx ];
    #pragma unroll
    for( int i=istart+1; i<N; i++ ){
        if( x[ i ] < min_val ){
            min_idx = i;
            min_val = x[ i ];
        }
    }
    swap( x[ istart ], x[ min_idx ] );
    sort_asc< istart+1, N, iend, iend-(istart+1) >( x );
}   

where iend_min_istart = iend - istart. If iend_min_istart < 0 then the recursion has finished, thus we can write the termination condition as:

// sort recursion termination condition.
template< int istart, int N, int iend > 
inline void sort_asc< -1 >
(
    float *x
)
{
    // do nothing.
}

The swap function is defined as:

void d_swap
( 
    float &a, 
    float &b 
)
{
    float c = a;
    a = b;
    b = c;
}

Then I'd call the sort function as:

void main( int argc, char *argv[] )
{
    float x[] = { 3, 4, 9, 2, 7 };   // 5 element array.
    sort_asc< 0, 5, 2, 2-0 >( x );   // sort from the 1st till the 3th element.
    float x_median = cost[ 2 ];      // the 3th element is the median
}

However, this code does not compile since c++ does not support partial template specialization for functions. Furthermore, I don't know how to write this in C++ meta programming. Is there any way to make this code work?

Luc
  • 445
  • 3
  • 17
  • 1
    Related: http://stackoverflow.com/q/19559808/420683 – dyp Dec 26 '13 at 01:45
  • Instead of partial specialization, overloading can often be used. Use a `std::integral_constant` and let the compiler deduce `N`; another overload uses a fixed `N`, therefore being more specialized and preferred in overload resolution. – dyp Dec 26 '13 at 01:49
  • 1
    Boost solution (looks like a quicksort): http://www.boost.org/doc/libs/1_55_0/libs/mpl/doc/refmanual/sort.html – dyp Dec 26 '13 at 01:51
  • In addition, to make my question more specific: the data in the input array `x` is known only at runtime, hence the data can only be sorted at runtime. However, I want to generate the C++ code, i.e. all recursive function calls to SORT::sort() at compile time. Furthermore, I want to use this code in a CUDA device function. Since CUDA is a subset of C++ with a handful of extensions, I thought this was the right place to ask. Unfortunately, I don't think CUDA supports the `constexpr` keyword, neither Boost nor STL. – Luc Dec 27 '13 at 23:05
  • I think you should add this part to your question (as an edit, not just as a comment), as it is quite essential. – dyp Dec 27 '13 at 23:14

2 Answers2

5

Using selection sort this time. (Variant using merge sort.) Without any guarantees, but passes a simple test:

// generate a sequence of integers as non-type template arguments
// (a basic meta-programming tool)
template<int... Is> struct seq {};
template<int N, int... Is> struct gen_seq : gen_seq<N-1, N-1, Is...> {};
template<int... Is> struct gen_seq<0, Is...> : seq<Is...> {};


// an array type that can be returned from a function
// and has `constexpr` accessors (as opposed to C++11's `std::array`)
template<class T, int N>
struct c_array
{
    T m[N];

    constexpr T const& operator[](int p) const
    {  return m[p];  }

    constexpr T const* begin() const { return m+0; }
    constexpr T const* end() const { return m+N; }
};


// return the index of the smallest element
template<class T, int Size>
constexpr int c_min_index(c_array<T, Size> const& arr, int offset, int cur)
{
    return Size == offset ? cur :
           c_min_index(arr, offset+1, arr[cur] < arr[offset] ? cur : offset);
}

Our goal is to be able to sort at compile-time. We can use several tools for this (macros, class templates, constexpr functions), but constexpr functions tend to be the simplest in many cases.

The function c_min_index above is an example for the C++11 restrictions on constexpr functions: It can only consist of a single return statement (plus static_assert and using), and at least for some possible arguments must yield a constant expression. This implies: No loops, no assignment. So we need to use recursion here, and pass the intermediary result to the next invocation (cur).

// copy the array but with the elements at `index0` and `index1` swapped
template<class T, int Size, int... Is>
constexpr c_array<T, Size> c_swap(c_array<T, Size> const& arr,
                                  int index0, int index1, seq<Is...>)
{
    return {{arr[Is == index0 ? index1 : Is == index1 ? index0 : Is]...}};
}

As we cannot modify the argument due to constexpr function restrictions, we need to return a copy of the whole array just to swap two elements.

The function expects the Is... to be a sequence of integers 0, 1, 2 .. Size-1, as produced by a gen_seq<Size>{} (in its base class). Those integers are used to access the elements of the array arr. Some expression like arr[Is]... would produce arr[0], arr[1], arr[2] .. arr[Size-1]. Here, we swap the index by applying a conditional to the current integer: a[0 == index0 ? index1 : 0 == index1 ? index0 : 0], ..

// the selection sort algorithm
template<class T, int Size>
constexpr c_array<T, Size> c_sel_sort(c_array<T, Size> const& arr, int cur = 0)
{
    return cur == Size ? arr :
           c_sel_sort( c_swap(arr, cur, c_min_index(arr, cur, cur),
                              gen_seq<Size>{}),
                       cur+1 );
}

Again, we need to replace loop with recursion (cur). The rest is a straight-forward selection sort implementation with an awkward syntax: Starting from the index cur, we find the minimal element in the remaining array, and swap it with the element at cur. Then, we re-run the selection sort on the remaining unsorted array (by incrementing cur).

#include <iostream>
#include <iterator>

int main()
{
    // homework: write a wrapper so that C-style arrays can be passed
    //           to an overload of `c_sel_sort` ;)
    constexpr c_array<float,10> f = {{4, 7, 9, 0, 6, 2, 3, 8, 1, 5}};
    constexpr auto sorted = c_sel_sort(f);
    for(auto const& e : sorted) std::cout << e << ", ";
}
Community
  • 1
  • 1
dyp
  • 38,334
  • 13
  • 112
  • 177
  • I was rather surprised that `c_swap` could be implemented so easily. Had anticipated much more trouble there. – dyp Dec 26 '13 at 03:13
  • Interesting this compiles in clang but not g++. –  Dec 26 '13 at 03:31
  • @remyabel Yeah, I had this problem already when implementing the merge sort. I think it could be circumvented here by passing around the `c_array` itself, not a pointer. – dyp Dec 26 '13 at 03:33
  • 1
    @remyabel Voilà, used `c_array` everywhere and now g++4.8.1 is happy, too. Though I don't quite know if that was a clang++ extension or g++ bug (tend to think it's the former). I think it also got simpler now, but unfortunately tightly coupled with `c_array`. – dyp Dec 26 '13 at 03:44
  • @DyP thank you so much for you detailed code and explanation. I appreciate it. I'm going to have to delve into C++ meta programming deeper, to understand how this code works under the hood though. – Luc Dec 27 '13 at 22:02
  • @Luc You're welcome. It isn't so much metaprogramming (only the `gen_seq` part) as it is programming under the constraints of `constexpr` functions. The algorithm can also be used at run-time (e.g. by passing an array whose elements are only known at run-time), but it could be inefficient as a run-time algorithm due to the amount of copying. – dyp Dec 27 '13 at 22:23
0

I came up with the following meta code and it works.

template < class T >
__forceinline void swap
( 
    T &a, 
    T &b 
)
{
    T c = a;
    a = b;
    b = c;
}

template< int istart, int N, int iend, int iend_min_istart >
class SORT
{
    public:
        static __forceinline void sort( float *x, int *idx )
        {
            // sort code.
            int   min_idx = istart;
            float min_val = x[ min_idx ];
            for( int i=istart+1; i<N; i++ ){
                if( x[ i ] < min_val ){
                    min_idx = i;
                    min_val = x[ i ];
                }
            }
            swap(   x[ istart ],   x[ min_idx ] );
            swap( idx[ istart ], idx[ min_idx ] );
            SORT<istart+1, N, iend, iend-(istart+1)>::sort( x, idx );
        }
};

template< int istart, int N, int iend >
class SORT<istart,N,iend,-1>
{
    public:
        static __forceinline void sort( float *x, int *idx )
        {

        }
};

void main( int argc, char *argv[] )
{
    float arr[] = {1,4,2,7,5};
    int   idx[] = {0,1,2,3,4};
    SORT<0,5,3,2-0>::sort( arr, idx );
}
Luc
  • 445
  • 3
  • 17
  • 1
    But this is only compile-time if your compiler resolves it as an optimization.. which means you don't get a result that you can use in contexts where a constant expression is required. – dyp Dec 27 '13 at 17:33
  • @DyP, I have the following situation: the data in the input array `x` is known only at runtime, its length `N`, `istart`, and `iend` are known at compile time. (`arr` in the main function was merely intended as an example, but its contents are not known at compile time in my application) Therefore, I must sort the data at runtime. However, I want to generate the C++ code, i.e. all recursive function calls to `SORT::sort()`, for sorting an array of length `N` from element `istart` to element `iend` at compile time. Will the compiler do that for me? – Luc Dec 27 '13 at 22:39
  • 1
    Well that depends on if the compiler actually inlines. `__forceinline`, despite its name, doesn't force inlining (at least, not for MSVC++). But now your algorithm makes more sense to me ;) I would recommend not to use mine (`constexpr`) to do run-time sorting. Maybe I'll add another answer later that demonstrates the overloading trick I mentioned in my comments to your question. It typically simplifies this kind of partial specialization. – dyp Dec 27 '13 at 22:51