1

Related question How to speed up reshape in higher rank tensor contraction by BLAS in Fortran?

Suppose I have a contraction A[a,b] * B[c,b,d] = C[a,c,d] (the dummy index b is in middle of tensor B, which looks tricky) and I would like to use DGEMM for it.

What I can do is

  1. reshape B[c,b,d] into B2[b,c,d]
  2. use the method in How to speed up reshape in higher rank tensor contraction by BLAS in Fortran? to evaluate A[a,b] * B2[b,c,d] = C[a,c,d]

Reshape can take a bit time. (If the target were C[c,a,d], I need one more shape). Is there more efficient approach?

AlphaF20
  • 583
  • 6
  • 14
  • 3
    First guess I'd keep it simple - A(a,b)*B(c,b) is a matrix times matrix transpose, and then there are d of them. I suspect this will work pretty well unless a, b, c, d are all very small - how big are they typically? – Ian Bush May 14 '21 at 06:21
  • 1
    Yes, IanBush's method is likely the best. You just need to call `dgemm` with `transa="N"` and `transb="T"`. This may even be faster than a non-transposed matrix multiplication. – veryreverie May 14 '21 at 07:54
  • 1
    Depending on the matrix sizes you might want to parallelize the `d` loop. For smaller matrices I'd also time `dgemm` vs. intrinsic matmul/transpose, e.g. `C(:,:,d) = matmul(A, transpose(B(:,:,d)))`. – jack May 14 '21 at 08:25
  • Yes, I considered //ising the d loop as well but it all depends on the matrix sizes - I have access to machines with 128 cores in a node currently, and that number is only going to increase. In practice some nested //isation, some in the matmult, some in the d loop, might be the best way to go. But this is far down the line and well into hand waving land. – Ian Bush May 14 '21 at 09:38
  • Typically 30~70 for `a,b,c,d`. There are some other higher-dimensional arrays. What will be the transpose on a three dimensional array? (sorry I don't know) `transpose(B(:,:,d)` will only transpose the first two indices? – AlphaF20 May 14 '21 at 17:49
  • @AlphaF20 `B(:,:,d)` is a _two_-dimensional array and `transpose` works as usual. The sizes of 30~70 sound quite small and `matmul` could be as fast as `dgemm`. Just time your code. – jack May 14 '21 at 20:45

0 Answers0