2

I have the following code:

template <int size>
inline uint hashfn( const char* pStr )
{
   uint result = *pStr;
   switch ( size )
   {
      case 10:
         result *= 4;
         result += *pStr;
      case 9:
         result *= 4;
         result += *pStr;
      ...
      ...
      case 2:
         result *= 4;
         result += *pStr;
   }
   return result;
}

This code is a hash function for DNA sequences of certain lenghts where the length is a template parameter. It is an unrolled loop with a switch statement to jump in at the right spot. The size is however a constant since it is a template parameter. Could I specialize this for certain size values? Maybe with something like:

template <int 2>
inline uint hashfn( const char* pStr )
{
  uint result = *pStr;
  result *= 4;
  ++pStr;
  result += *pStr;
  return result;
}
Jeroen Dirks
  • 7,705
  • 12
  • 50
  • 70
  • Where is your loop? Are you sure multiplying by 4 is a good idea? It should probably be a prime. – ebo Jul 08 '09 at 17:25
  • The loop is already unrolled. This is a very specialized hash function. Character values are just 0,1,2,3 . – Jeroen Dirks Jul 08 '09 at 17:30
  • I would no change your code, and look at the generated code first. Since the number is constant, the compiler knows at compile time where it jumps to, and there is a good chance that it just optimizes the switch out altogether. It just looks like a simple "goto x;" statement to the compiler, i think. – Johannes Schaub - litb Jul 08 '09 at 17:44
  • Clever, but not so readable without a comment. You do know the compiler will automagically do loop unrolling for you. Have you measured the speed increase in your design against what the compiler can do? – Martin York Jul 08 '09 at 18:52
  • I'd probably get slaughtered for not answering your question and instead advice you to use polymorphism for this, so I'm suggesting this in a comment instead :) – rtn Jul 08 '09 at 19:24
  • 1
    Maybe my compiler is not so smart (VC++.Net 2003) but it was not unrolling the loop for the longer sequences. On my computer with my compiler there was a 20% speed improvement by manually unrolling the loop. – Jeroen Dirks Jul 08 '09 at 20:29

7 Answers7

3

I would tend to do it recursively with templates.

E.g. :

template<class TOp,int factor>
struct recursive_unroll {
    __forceinline static void result( TOp& k ) {
        k();
        recursive_unroll<TOp,factor-1>::result( k );
    }
};

template<class TOp>
struct recursive_unroll<TOp,0> {
    __forceinline static void result( TOp& k ) {}
};


struct  op    {
    op( const char* s ) : res( 0 ), pStr( s )   {}

    unsigned int res;
    const char* pStr;        
    __forceinline void  operator()()  {
        res *= 4;
        res += *pStr;
        ++pStr;
        //std::cout << res << std::endl;
    }
};

char str[] = "dasjlfkhaskjfdhkljhsdaru899weiu";

int _tmain(int argc, _TCHAR* argv[])
{
    op tmp( str );
    recursive_unroll<op,sizeof( str ) >::result( tmp );
    std::cout << tmp.res << std::endl;
    return 0;
}

This produces optimal code for me. Without __forceinline the code is not properly inlined.

You should always bench your code before using such optimizations. And then you should look at the assembly and decipher what your compiler already does for you. But in this case, it seems to be an boost (for me).


__forceinline is a Microsoft Visual Studio specific extension. The compiler should generate optimal code, but for this it doesn't. So here I used this extension.

Christopher
  • 8,912
  • 3
  • 33
  • 38
1

Please read up about loop unrolling on wikipedia. The whole point is to save comparisons on the loop variable. Did you profile the code? I do not see how this can save you cycles compared to a loop.

Any modern compiler should completely unroll a loop with a small static loop count for you.

I also hope you don't use the hashes for modulo-based hash tables, as you will lose the upper bits of your hash.

ebo
  • 8,985
  • 3
  • 31
  • 37
  • Actually, I've improved performance by rolling loops up tight. I think it had something to do with cache locality. – David Thornley Jul 08 '09 at 20:32
  • Improbable, as long as you are loading at least ints and do not change the order of the memory accesses. I would guess it had to do with branch prediction. A correct guess can make a great difference with low round trip times. Also: Are you sure your compiler did not unroll the loop again? – ebo Jul 08 '09 at 20:37
0

There's no loop involved in your original code, so it cannot be unrolled.

Paul Sonier
  • 38,903
  • 3
  • 77
  • 117
0

You are allowed to specialize like that. However, I think you are focusing on the optimization of unrolling the loop. Until you profile your code and show that it is a bottleneck, you are better off not prematurely optimizing.

rlbond
  • 65,341
  • 56
  • 178
  • 228
0

I have the following code: ... (code that uses Duff's device) ...

It is an unrolled loop with a switch statement to jump in at the right spot. The size is however a constant since it is a template parameter. Could I specialize this for certain size values?

You certainly can, but there will be a lot of boilerplate involved, for each possible size. With appropriate levels of abstraction, and use of the Boost metaprogramming and preprocessor libraries you might even get this down to something reasonable.

Your maintenance costs will go up, so I'll echo others and recommend you make sure you really should do this now that you know you can.

Community
  • 1
  • 1
Max Lybbert
  • 19,717
  • 4
  • 46
  • 69
0

I dont know if the semantics is right, but in this case the function template itself could be recursive:

template <int size>
inline uint hashfn( const char* pStr ) {
    uint result = *pStr;
    result *= 4;
    return result + hashfn<size-1>(++pStr);
}

//case 0 stops
template <>
inline uint hashfn<0>( const char* pStr ) {
    return 0;
}

As Christopher said, just be sure that it gets inlined...

ogoid
  • 405
  • 4
  • 8
  • as the saying goes... for every problem there is a solution that is both simply and wrong... the code I posted won't work as is because you would need to pass the previous calculation to the next iteration, not just add them as I've done. This could be made by an functor as sugested by Christopher or by overloading hashfn to receive the previous step result. – ogoid Jul 28 '09 at 04:10
0

For code

template <size_t N>
void
function (float *a, float *b, float *c)
{
  for (size_t i = 0; i < N; ++i)
    {
      c[i] = a[i] * b[i];
    }
}

my compilers (msvc 2005, gcc 4.3) unroll loops automatically.

Sergey Miryanov
  • 1,820
  • 16
  • 29