I have a column major matrix and I want to convert it to a row major matrix. Using Arm SVE instruction. I know "gathering and scattering" instructions but are not good enough for my case. Does anyone have an idea?
Asked
Active
Viewed 215 times
1
-
2You need to implement a matrix transposition. A good fast way to do so is the *recursive matrix transposition algorithm* where you recursively split the matrix into 4 block matrices, transpose each block and then swap the top right with the bottom left block. – fuz Jun 17 '22 at 11:29
-
See [fast bit-matrix (64x64) transpose algorithm using SIMD (ARM)](https://stackoverflow.com/q/71552776) for a recursive transpose using fast contiguous loads. (That transposes bits, but if you leave out the within-byte steps and just look at groups of 1, 2, 4 and 8 bytes, you'll see a good way to transpose byte or int/float matrices). SVE has gather/scatter, but that doesn't mean it's fast in terms of bytes loaded per cycle. Cache access patterns are still critically important. I'd assume there's a simpler example somewhere for ARM NEON or maybe even SVE of transposing a float matrix – Peter Cordes Jun 17 '22 at 23:55
-
Note that for some use-cases, it can make sense to just transpose on the fly (possibly even with SVE gathers, although that could perform really badly). Or adapt your next use of the matrix to its layout (e.g. a matmul). But if you need to use it multiple times, transposing once can make all of those later uses more efficient in speed and cache footprint, and convenient for SIMD with the layout of another matrix. – Peter Cordes Jun 17 '22 at 23:58