2

I am trying to optimize my math calculation code base and I found this piece of code from here

this piece of code tries to calculate the matrix multiplication. However, I don't understand how enum can be used for calculation here. Cnt is a type specified in

template <int I=0, int J=0, int K=0, int Cnt=0> 

and somehow we can still do

Cnt = Cnt + 1

Could anyone give me a quick tutorial on how this could be happening?

Thanks

template <int I=0, int J=0, int K=0, int Cnt=0> class MatMult
    {
    private :
        enum
            {
            Cnt = Cnt + 1,
            Nextk = Cnt % 4,
            Nextj = (Cnt / 4) % 4,
            Nexti = (Cnt / 16) % 4,
            go = Cnt < 64
            };
    public :
        static inline void GetValue(D3DMATRIX& ret, const D3DMATRIX& a, const D3DMATRIX& b)
            {
            ret(I, J) += a(K, J) * b(I, K);
            MatMult<Nexti, Nextj, Nextk, Cnt>::GetValue(ret, a, b);
            }
    };

// specialization to terminate the loop
template <> class MatMult<0, 0, 0, 64>
    {
    public :
        static inline void GetValue(D3DMATRIX& ret, const D3DMATRIX& a, const D3DMATRIX& b) { }
};

Or maybe I should ask more specifically, how does Nexti, Nextj, Nextk, Cnt get propagated to the next level when the for loop is unrolled.

thanks

user152503
  • 401
  • 2
  • 15
  • 4
    Sounds like you could use a [good C++ book](http://stackoverflow.com/questions/388242/the-definitive-c-book-guide-and-list)(specifically one on template meta programming) – NathanOliver Mar 28 '17 at 14:57
  • `Cnt` is an `int` not a type parameter, it is called a non-type parameter. – Jean-Baptiste Yunès Mar 28 '17 at 15:12
  • You should be aware, that this code is not even necessarily a win. The overhead of checking loop boundaries is very very small, and nowadays when processors can do multiple instructions per cycle (IPCs), especially with integers, it may be nearly free. On the other hand this causes a lot of code bloat. If you pass the size as integer non-type template parameters, but just write normal loops, you still give the compiler to unroll, but don't force it to. Then it will often generate very clean code that unrolls by a factor of 4 or similar. – Nir Friedman Mar 28 '17 at 18:23

0 Answers0