2

I'm trying to implement the code shown in this pdf. More precisely (page 50):

#define SM (CLS / sizeof (double))

for (i = 0; i < N; i += SM)
  for (j = 0; j < N; j += SM)
    for (k = 0; k < N; k += SM)
      for (i2 = 0, rres = &res[i][j],
           rmul1 = &mul1[i][k]; i2 < SM;
           ++i2, rres += N, rmul1 += N)
         for (k2 = 0, rmul2 = &mul2[k][j];
              k2 < SM; ++k2, rmul2 += N)
           for (j2 = 0; j2 < SM; ++j2)
              rres[j2] += rmul1[k2] * rmul2[j2];

Now, as far as I'm concerned, rres is int*, rmul2 and rmul1 too.

Should it look like

int *rres;
int *rmul1;
int *rmul2;

for (i = 0; i < N; i += SM)
   for (j = 0; j < N; j += SM)
      for (k = 0; k < N; k += SM)
         for (i2 = 0, *rres = &res[i][j],
              *rmul1 = &mul1[i][k]; i2 < SM;
              ++i2, rres += N, rmul1 += N)
            for (k2 = 0, *rmul2 = &mul2[k][j];
                 k2 < SM; ++k2, rmul2 += N)
               for (j2 = 0; j2 < SM; ++j2)
                  rres[j2] += rmul1[k2] * rmul2[j2];

Because this seems more or less reasonable to me bit gives wrong results. For example, if I have two matrices 2 x 2, randoms values 0 or 1, I get:

-1520010527 23996350 
212687419 207125308

which is far from good. My guess is that I use * wrong but I can't tell where...

EDIT: Declaration of res:

  int **res = (int **)malloc(N * sizeof(int *));
  for (int i = 0; i < N; i++) {
      res[i] = (int *)malloc(N * sizeof(int));
  }
  for (int i = 0; i < N; i++) {
      for (int j = 0; j < N; j++) {
          res[i][j] = 0;
      }
  }
chqrlie
  • 131,814
  • 10
  • 121
  • 189
Nerwena
  • 23
  • 6
  • 1
    How is `res` declared/initialized? – Scott Hunter Jan 15 '21 at 20:10
  • @ScottHunter I edited the question. – Nerwena Jan 15 '21 at 20:13
  • `*rres = &res[i][j]` looks weird: it writes into the pointer `rres` the _address_ of some number `res[i][j]`. So, the value of `*rres` (_not_ `rres`!) will be some memory address. Is this intended? – ForceBru Jan 15 '21 at 20:13
  • Consider `int** res = malloc(sizeof *res * N);`: Drop the [unneeded cast](https://stackoverflow.com/questions/605845/do-i-cast-the-result-of-malloc) and size to the object, not type. – chux - Reinstate Monica Jan 15 '21 at 20:18
  • @ForceBru not really, I added this because without it I get `subscripted value is neither array nor pointer nor vector rres[j2] += rmul1[k2] * rmul2[j2];`. I had a feeling there might be a problem with this, but I have no idea how to do it correctly. I mean I'd like `rres` to write the results to `res`. – Nerwena Jan 15 '21 at 20:21
  • @chux-ReinstateMonica that isn't the problem here but thanks for the tip. – Nerwena Jan 15 '21 at 20:24
  • 2
    In Drepper's paper, the example code is for multiplying two-dimensional arrays of `double`s. In that case, `rres`, `rmul1`, and `rmul2` would all have type `double *`. More importantly, however, **Drepper's `res` is an array of arrays, whereas yours is an array of pointers**. That makes a tremendous difference. – John Bollinger Jan 15 '21 at 22:10
  • Same for Drepper's `mul1` and `mul2`. – John Bollinger Jan 15 '21 at 22:22
  • @JohnBollinger I wanted to use `int` since I don't really need `double` here; of course, I took that into account. Oh, I had a feeling this also might be the problem here. Well, I didn't want to do `arr[n][n]` because the size varies but int that case maybe I should stick to that. – Nerwena Jan 15 '21 at 22:27
  • 1
    @Nerwena, The discussion from which you drew the code is all about optimizing for efficient cache usage. Doing so depends intimately on the layout of the data. There is no point in going through the exercise if you do not use the style of data layout that the code is intended for. – John Bollinger Jan 15 '21 at 22:34
  • *Should there be pointers?* no there should not be stars in those `for` statements. – chqrlie Jan 15 '21 at 22:39
  • 1
    `*rres = &res[i][j]` should give a compilation error. If it doesn't then adjust your compiler settings as it is causing you to wasting time by not having that information available – M.M Jan 15 '21 at 23:38

1 Answers1

3

Your assumption about the type of rres, rmul2 and rmul1 seems incorrect: the matrix elements probably have double type, and the expressions in the initial clauses of the for loops should definitely not have an initial *.

Yet this matrix multiplication code should work the same for any element type, but the cache optimisation explained by Ulrich Drepper makes another assumption: N must be a multiple of SM, and it is unlikely to be the case with #define SM (CLS / sizeof(double)) and N = 2 in your test case, which might explain your observations.

The code should be:

#define SM  (CLS / sizeof(double))

void matrix_multiply(double res[N][N], const double mul1[N][N], const double mul2[N][N]) {
    size_t i, j, k, i2, j2, k2;
    double *rres, *rmul1, *rmul2;
    
    assert(N % SM == 0);

    for (i = 0; i < N; i += SM) {
        for (j = 0; j < N; j += SM) {
            for (k = 0; k < N; k += SM) {
                for (i2 = 0, rres = &res[i][j], rmul1 = &mul1[i][k];
                     i2 < SM; ++i2, rres += N, rmul1 += N) {
                    for (k2 = 0, rmul2 = &mul2[k][j];
                         k2 < SM; ++k2, rmul2 += N) {
                        for (j2 = 0; j2 < SM; ++j2)
                            rres[j2] += rmul1[k2] * rmul2[j2];
                    }
                }
            }
        }
    }
}
chqrlie
  • 131,814
  • 10
  • 121
  • 189