19

I wonder how to get something like this:

  1. Write

    copy(a, b, 2, 3)
    
  2. And then get

    a[2] = b[2];
    a[3] = b[3];
    a[4] = b[4];
    

I know that C #defines can't be used recursively to get that effect. But I'm using C++, so I suppose that template meta-programming might be appropriate.

I know there is a Boost library for that, but I only want that "simple" trick, and Boost is too "messy".

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
cibercitizen1
  • 20,944
  • 16
  • 72
  • 95
  • 3
    Are your before and after measurements of performance different enough that this is the place to focus your efforts? – Bill Mar 04 '10 at 19:38
  • 18
    @Bill: yeah, because we wouldn't want people just going around learning techniques for the sake of "education" or "curiosity" ;-) – Steve Jessop Mar 04 '10 at 19:46
  • actually you can accomplish this using boost preprocessor. I did something similar to implement python unpack semantics in C++, i.e. `unpack((const int a,b,c), abcd);` – Anycorn Mar 04 '10 at 19:56
  • The motivation is both for education and curiosity, so I would like to get the source code for that, not using the boost lib. – cibercitizen1 Mar 04 '10 at 20:10
  • Just a thought: when using `copy` you usually have `source` and then `destination` so I would understand `copy(a,b,2,3)` as copying from `a` to `b` (like `std::copy` does) – Matthieu M. Mar 05 '10 at 07:19
  • check this http://stackoverflow.com/questions/2380143/c-metaprograming-with-templates-versus-inlining – Andrey Mar 04 '10 at 19:39

9 Answers9

27

C++ meta-programming is recursive.

Think of your problem in terms of recursion. Implement terminal case and non-terminal cases.

Your terminal case can be either 0 or one. Pass limits as template parameters. Use a structure/class because they allow partial specialization and other neat things:

template<int from, int to>
struct copy {
    static void apply(source, destination) {
        source[from] = destination[from];
        copy<from+1,to>:: apply(source, destination);
    }
};

// Terminal case
template<int from>
struct copy<from, from> {
    static void apply(source, destination) {
        source[from] = destination[from];
    }
};
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Anycorn
  • 50,217
  • 42
  • 167
  • 261
  • 1
    `source` and `destination` need to be declared; this doesn't compile as it stands. – CB Bailey Mar 04 '10 at 20:41
  • 1
    Hey, that's the same code I gave in my question: http://stackoverflow.com/questions/2380143/c-metaprograming-with-templates-versus-inlining Does it means, this is the only way to do it *using templates*. – cibercitizen1 Mar 04 '10 at 22:33
  • 1
    @CharlesBailey Ummm... `source` and `destination` are arguments? – Jonathan Mee Mar 09 '16 at 12:16
  • Doesn't this copy the 'destination' to the 'source'? Also, can you show a compilable example of the construct in use? I tried to use `copy<2, 3>(a, b);` and G++ 6.2.0 wasn't happy, even after I use `static void apply(int source[], int destination[])` in the function definitions. It gives error `no matching function for call to 'copy<2, 3>::copy(int [6], int [6])'`. It lists 3 candidates : `: constexpr copy<2, 3>::copy()`, `constexpr copy<2, 3>::copy(const copy<2, 3>&)`, and `constexpr copy<2, 3>::copy(copy<2, 3>&&)` and rejects them all because they expect 0 or 1 arguments and are given 2. – Jonathan Leffler Jan 06 '17 at 19:52
26

The most straightforward solution to this is to write a loop where the start and end values are known:

for(int i = 2; i <= 4; i++) {
  a[i]=b[i]; 
}

I think this is better than any sort of template/runtime-call mixture: The loop as written is completely clear to the compilers' optimizer, and there are no levels of function calls to dig through just to see what's going on.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Johannes Schaub - litb
  • 496,577
  • 130
  • 894
  • 1,212
  • 5
    Plus, many compilers will unroll this loop when optimization is turned on. – John Dibling Mar 04 '10 at 20:37
  • Yes I agree ! And it is trivial to write a define to be used like copy(a,b,2,4) – cibercitizen1 Mar 04 '10 at 22:35
  • 47
    This is the accepted answer? `"how to unroll a loop with templates"` and the answer is don't even try? Maybe you're compiler will unroll this but not all. – Inverse Feb 13 '11 at 06:51
  • 8
    @Inverse a compiler that cannot unroll this also cannot inline function calls that result from the various template solutions presented. I *did* try, and this is what I came up, which I believe yields to loop unrolling. – Johannes Schaub - litb Feb 13 '11 at 22:12
  • 1
    That's cool. I was hoping for a general solution to better control unrolling in the general case. I thought you were being snarky for some reason ;) but it is fine for a fixed loop. – Inverse Feb 14 '11 at 04:59
  • 1
    Mind you that, in CUDA, loop unrolling can sometimes make the difference between the use of different memory spaces for the data. – einpoklum Feb 29 '16 at 20:40
  • this is not the answer at all. The question is asking how to unroll a loop in C++ level, but the compiler is to unroll it in the compiled machine code level. The former is differently required. For example, I am writing some HLS task that requires C++ level unrolling. – justin.yqyang Jan 04 '17 at 14:57
  • @justin.yqyang you are right. since I cannot delete it (because it was accepted), i will demolish the answer and refer to a comment EDIT: ... and make it community wiki – Johannes Schaub - litb Jan 04 '17 at 19:08
  • @justin.yqyang to be more precise.. well then the answer is "it is impossible". he says "and get...". but he cannot have one line of code that at the same time is equivalent in C++-sense to three line of codes. The one line is a function call and the other three lines are assignments. The equivalence relies on optimizations. In his case, and in my case likewise. **Perhaps deleting it was a bit too fast. Not sure, but I'll let the community decide!** – Johannes Schaub - litb Jan 04 '17 at 19:14
  • Given the votes on the post and that it's an accepted solution to the OP, I don't see any reason to delete this. It obviously is seen as a solution to the overall problem by many. I don't see any reason to delete it just because it is not the most pure solution to the question. That's what voting and posting other answers is for. – Martijn Pieters Jan 08 '17 at 10:33
  • @MartijnPieters it is now an answer for it because another guy edited the question. i will further edit it to make it clearer – Johannes Schaub - litb Jan 08 '17 at 11:35
4

You can probably do something like the following. Depending on your compiler and the optimziation settings that you use you may get the effect that you are looking for.

Be aware that for small objects like char it may well be slower than a std::copy or a memcpy and that for larger objects the cost of a loop is likely to be insignificant compared to the copies going on in any case.

#include <cstddef>

template<std::size_t base, std::size_t count, class T, class U>
struct copy_helper
{
    static void copy(T dst, U src)
    {
        dst[base] = src[base];
        copy_helper<base + 1, count - 1, T, U>::copy(dst, src);
    }
};

template<std::size_t base, class T, class U>
struct copy_helper<base, 0, T, U>
{
    static void copy(T, U)
    {
    }
};

template<std::size_t base, std::size_t count, class T, class U>
void copy(T dst, U src)
{
    copy_helper<base, count, T, U>::copy(dst, src);
}

template void copy<5, 9, char*, const char*>(char*, const char*);

#include <iostream>
#include <ostream>

int main()
{
    const char test2[] = "     , World\n";
    char test[14] = "Hello";

    copy<5, 9>(test, test2);

    std::cout << test;

    return 0;
}
CB Bailey
  • 755,051
  • 104
  • 632
  • 656
  • Hey, that's the same code I gave in my question: stackoverflow.com/questions/2380143/… Does it means, this is the only way to do it using templates? I'm starting to believe so. – cibercitizen1 Mar 04 '10 at 22:37
  • @cibercitizen1: I wrote this independently, but to a well known pattern. – CB Bailey Mar 04 '10 at 22:56
3

It's important to realize that the compiler is very smart, and that tricking it to unroll loops using template metaprogramming will probably set you back further that it gets you forward.

To get the bottom out of your optimizations: keep an eye on the disassembly. This will hopefully teach you more than throwing templates at the problem.

And note, like Johannes said: if the compiler can see that you are running a loop for a fixed number of times (or a fixed multiple of times like 4x variable), it can create code very close to optimal.

Jan
  • 1,807
  • 13
  • 26
2

Loop unrolloing using meta-programming can be used to create constexpr (I have not measured times). I have an example where it can be used to make the function combination(n, k) a cosntexpr:

template <size_t N> struct iterate_forward {
    template <class operation> static auto eval(const operation & op)
    {
        iterate_forward<N-1>::eval(op);
        return op(N-1);
    }
};
template <> struct iterate_forward<1>
{
    template <class operation> static auto eval(const operation & op)
    { return op(0); }
};
template <> struct iterate_forward<0>
{
    template <class operation> static auto eval(const operation & op) {}
};
template <size_t N, size_t K> constexpr size_t COMB()
{
    struct ratio
    {
        size_t operator () (size_t i) const { return (res *= (N-i) / (i+1)); }
        mutable size_t res = 1;
    };
    return iterate_forward<K>::eval(ratio());
} 
Marialena
  • 817
  • 8
  • 31
Marco
  • 21
  • 1
1

From http://www.sgi.com/tech/stl/Vector.html:

template <class InputIterator>
vector(InputIterator, InputIterator)

Creates a vector with a copy of a range. 
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Anthony Labarre
  • 2,745
  • 1
  • 28
  • 39
  • 2
    I'm sorry, but no. I'm looking for something as close as possible to a search -copy(a,b,2,3)- and replace by a[2]=b[2]; a[3]=b[3]; a[4]=b[4]; – cibercitizen1 Mar 04 '10 at 20:17
1

It doesn't use templates and it's not a "complete" unrolling, but you can partially unroll the loop with something like this:

void copy (SomeType* a, SomeType* b, int start_index, int num_items) {
    int i = start_index;

    while (num_items > 4) {
            a[i+0] = b[i+0];
            a[i+1] = b[i+1];
            a[i+2] = b[i+2];
            a[i+3] = b[i+3];
            i += 4;
            num_items -= 4;
    }
    while (num_items > 0) {
            a[i] = b[i];
            ++i;
            --num_items;
    }
}

Now in this particular example, the extra computations involved will probably outweigh the benefits from only unrolling four elements at a time. You should get an increasing benefit from an increasing number of elements inside the top loop (throughout the function, replace 4 with however many elements you are copying inside each manually-unrolled iteration).

bta
  • 43,959
  • 6
  • 69
  • 99
1
template <int begin, int end> struct copy_;

template <int end> struct copy_<end, end> {
  template <typename T> static void execute(T& a, T& b)
  {
    a[end] = b[end];
  }
};

template <int begin, int end> struct copy_<begin, end> {
  template <typename T> static void execute(T& a, T& b)
  {
    a[begin] = b[begin];
    copy_<begin+1, end>::execute(a, b);
  }
};

template <int begin, int how_many> struct copy {
  template <typename T> static void execute(T& a, T& b)
  {
    copy_<begin, begin+how_many-1>::execute(a, b);
  }
};

copy<2, 3>::execute(a, b);
Corwin
  • 575
  • 2
  • 9
  • Hey, that's the same code I gave in my question: stackoverflow.com/questions/2380143/… Does it means, this is the only way to do it using templates? This is the 3rd code in that direction, so it seems clear. – cibercitizen1 Mar 04 '10 at 22:39
  • @cibercitizen1 Na, all it means is people are too lazy to do anything but a search and make a question out of the answer.... – Archival Oct 21 '12 at 13:43
0
#define tl template
#define tn typename
#define st struct
#define cl class
#define us using


    template<tn A> st Typ { using type = A; };
    tl<tn T> using GetT = tn T::type;
    tl<tn F, tn ...As> us apply = tn F::tl apply<As...>;
    tl<tn, tn, tn ...> st LoopElements;
    tl<tn, tn> st Loop;
    tl<tn, tn, tn> st VLoop_i;
    tl<tn Sq, tn MF> us VLoop = VLoop_i<GetT<Sq>, Sq, MF>;

    //
    // TY
    //
    template<tn T> struct Ty {
        template<T ...> struct Pack : Typ<T> {};
        tl<tn ...> st Concat_i; tl<tn ...P> us Concat = GetT<Concat_i<P...>>;
        tl<T, int64_t> st Seq_i; tl<T f, T t> us Seq = GetT<Seq_i<f, ((int64_t)(t - f))>>; tl<int64_t, tn> st add;

        template<tl<T ...> tn C, T ...vs, tl<T ...> tn D, T ...ws, tn ...R> st Concat_i<C<vs...>, D<ws...>, R...> : Typ<Concat<C<vs..., ws...>, R...> >{};
        template<tl<T ...> tn C, T ...vs> st Concat_i<C<vs...>> : Typ<C<vs...> >{};

        template<int64_t x, T ...vs> struct add<x, Pack<vs...>> : Typ<Pack<((T)(vs + x))...>> {};
        template<T f, int64_t c> class Seq_i {
            using A = tn Seq_i<f, c/2>::type;
            using B = tn add<c/2, A>::type;
            using C = tn Seq_i<f + c / 2 * 2, c & 1>::type;
        public:
            using type = Concat<A, B, C>;
        };
        tl<T f> st Seq_i<f, 0> : Typ<Pack<>> {};        
        tl<T f> st Seq_i<f, 1> : Typ<Pack<f>> {};
        tl<T f> st Seq_i<f, -1> : Typ<Pack<f>> {};
    };

    //
    // LOOP
    template<size_t i, tn F, tn T> struct LoopElement { LoopElement() { apply<F, T>(); }; };
    template<size_t ...is, tn F, tn ...P> struct LoopElements<Ty<size_t>::Pack<is...>, F, P...> : LoopElement<is, F, P>... {};
    template<tn F, tl<tn ...> cl C, tn ...P> struct Loop<F, C<P...>> : LoopElements<Ty<size_t>::Seq<0, sizeof...(P)>, F, P...> {};

    template<tn T, tl<T ...> cl ST, T ...vs, tn MF> struct VLoop_i<T, ST<vs...>, MF> : LoopElements<Ty<size_t>::Seq<0, sizeof...(vs)>, MF, Val<T, vs>...> {};
  • 5
    Welcome to Stack Overflow! While this code snippet may solve the question, [including an explanation](//meta.stackexchange.com/questions/114762/explaining-entirely-code-based-answers) really helps to improve the quality of your post. Remember that you are answering the question for readers in the future, and those people might not know the reasons for your code suggestion. Please also try not to crowd your code with explanatory comments, this reduces the readability of both the code and the explanations! – kayess Jan 13 '17 at 13:08