0

I found an answer to this question very well described for a rectangle matrix n=2 in here

What is the fastest way to transpose a matrix in C++?

But my question is how to do the same in more general case. Let have B a transpose of A as

B[i1][i2][J][i4][K][i6][i7] = A[i1][i2][K][i4][J][i6][i7]

so in this particular case of n=7 we do transpose between 3th and 5th index marked as J, K. I assume that whole data structure is within compact memory block float*. Brackets above is used just for symbolic expression of transform operation.

I am about to deal with larger dimensions (possibly n=7) with a lots of data (some dimensions has lower rank about 3-5 and some of them are realatively large about 1000).

Is there a way how to make a really fast algorithm avoid chache-misses or even better how to use advatage of SSE (or AVX) intrinsics just like in the question mentioned above?

Aamir
  • 1,974
  • 1
  • 14
  • 18
  • 4
    sometimes the fasted way to do something is to not do it. Rather than transposing a matrix `M` you can access `M[j][i]` instead of `M[j][i]` – 463035818_is_not_an_ai Sep 08 '22 at 12:44
  • 1
    @463035818_is_not_a_number I think you made a mistake in either `M[j][i]` or in `M[j][i]`... :) – Some programmer dude Sep 08 '22 at 12:48
  • 2
    indeed ;). Btw the main counter argument in the other quesiton for swapping indices not being sufficient is cache-misses. Though, I doubt that it will actually have any impact for a 7 dimensional data structure with many entries, where I'd expect cache misses when going along some inner dimension anyhow. – 463035818_is_not_an_ai Sep 08 '22 at 12:53
  • Well if a special case when one of the index is always the last one is possible to speed up more then two times, I can imagine that it could be beneficial to do this swap twice to cover general case. However the fact is that I have to go along all the data (no miss) and every single one element ends up in different place in the memory. The only exception is data on the same value of both swapping indices (in the same way as diagonal elements on square matrix). If every element is moved somewhere, there has to be some way how do it in some kind of blocks I believe. – Marek Basovník Sep 08 '22 at 14:13

1 Answers1

0

Have an index table to access your underlying matrix:

template <std::size_t N>
class Matrix {
  std::array<std::size_t, N> index; // init to 0, 1, 2, ..., N
  void transpose(std::size_t i, std::size_t j) {
    std::swap(index[i],index[j]);
  }
  // access M[index[i]...] instead of M[i...]
};
bitmask
  • 32,434
  • 14
  • 99
  • 159