I'm trying to implement the code shown in this pdf. More precisely (page 50):
#define SM (CLS / sizeof (double))
for (i = 0; i < N; i += SM)
for (j = 0; j < N; j += SM)
for (k = 0; k < N; k += SM)
for (i2 = 0, rres = &res[i][j],
rmul1 = &mul1[i][k]; i2 < SM;
++i2, rres += N, rmul1 += N)
for (k2 = 0, rmul2 = &mul2[k][j];
k2 < SM; ++k2, rmul2 += N)
for (j2 = 0; j2 < SM; ++j2)
rres[j2] += rmul1[k2] * rmul2[j2];
Now, as far as I'm concerned, rres
is int*
, rmul2
and rmul1
too.
Should it look like
int *rres;
int *rmul1;
int *rmul2;
for (i = 0; i < N; i += SM)
for (j = 0; j < N; j += SM)
for (k = 0; k < N; k += SM)
for (i2 = 0, *rres = &res[i][j],
*rmul1 = &mul1[i][k]; i2 < SM;
++i2, rres += N, rmul1 += N)
for (k2 = 0, *rmul2 = &mul2[k][j];
k2 < SM; ++k2, rmul2 += N)
for (j2 = 0; j2 < SM; ++j2)
rres[j2] += rmul1[k2] * rmul2[j2];
Because this seems more or less reasonable to me bit gives wrong results. For example, if I have two matrices 2 x 2, randoms values 0 or 1, I get:
-1520010527 23996350
212687419 207125308
which is far from good. My guess is that I use *
wrong but I can't tell where...
EDIT:
Declaration of res
:
int **res = (int **)malloc(N * sizeof(int *));
for (int i = 0; i < N; i++) {
res[i] = (int *)malloc(N * sizeof(int));
}
for (int i = 0; i < N; i++) {
for (int j = 0; j < N; j++) {
res[i][j] = 0;
}
}