I have a typical algorithm for matrix multiplication. I am trying to apply and understand loop unrolling, but I am having a problem implementing the algorithm when I am trying to unroll k times when k isn't a multiple of the matrices size. (I get very large numbers as a result instead). That means I am not getting how to handle the remaining elements after unrolling. Here is what I have:
void Mult_Matx(unsigned long* a, unsigned long* b, unsigned long*c, long n)
{
long i = 0, j = 0, k = 0;
unsigned long sum, sum1, sum2, sum3, sum4, sum5, sum6, sum7;
for (i = 0; i < n; i++)
{
long in = i * n;
for (j = 0; j < n; j++)
{
sum = sum1 = sum2 = sum3 = sum4 = sum5 = sum6 = sum7 = 0;
for (k = 0; k < n; k += 8)
{
sum = sum + a[in + k] * b[k * n + j];
sum1 = sum1 + a[in + (k + 1)] * b[(k + 1) * n + j];
sum2 = sum2 + a[in + (k + 2)] * b[(k + 2) * n + j];
sum3 = sum3 + a[in + (k + 3)] * b[(k + 3) * n + j];
sum4 = sum4 + a[in + (k + 4)] * b[(k + 4) * n + j];
sum5 = sum5 + a[in + (k + 5)] * b[(k + 5) * n + j];
sum6 = sum6 + a[in + (k + 6)] * b[(k + 6) * n + j];
sum7 = sum7 + a[in + (k + 7)] * b[(k + 7) * n + j];
}
if (n % 8 != 0)
{
for (k = 8 * (n / 8); k < n; k++)
{
sum = sum + a[in + k] * b[k * n + j];
}
}
c[in + j] = sum + sum1 + sum2 + sum3 + sum4 + sum5 + sum6 + sum7;
}
}
}
Let's say size aka n
is 12. When I unroll it 4 times, this code works, meaning when it never enters the remainder loop. But I am losing track of what's going on when it does! If anyone can direct me where I am going wrong, I'd really appreciate it. I am new to this, and having a hard time figuring out.