Transpose a matrix of unrestricted size using AVX-512

Question

I'm trying to use avx-512 to do matrix transpose. But now my code can only transpose square matrix. If I don't specify the array size of A(for example 100) before calling the function, there will be *** stack smashing detected ***: terminated Aborted (core dumped).

#include <immintrin.h>
#include <complex.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <stdint.h>
#include <math.h>

void matrix_transpose_avx(float *matA_re, float *matA_im, int rowA, int colA){
    float *temp_re = (float *)malloc(rowA * colA * sizeof(float));
    float *temp_im = (float *)malloc(rowA * colA * sizeof(float));
    memcpy(temp_re, matA_re, (rowA * colA * sizeof(float)));
    memcpy(temp_im, matA_im, (rowA * colA * sizeof(float))); 
    __m512 re_vec, im_vec;

    for (int i = 0; i < rowA; ++i){
        for (int j = 0; j < colA; ++j){
            re_vec = _mm512_loadu_ps(&temp_re[j * rowA + i]);
            _mm512_storeu_ps(&matA_re[i * colA + j], re_vec);
            im_vec = _mm512_loadu_ps(&temp_im[j * rowA + i]);
            _mm512_storeu_ps(&matA_im[i * colA + j], im_vec);
        }
    } 
    free(temp_re);
    free(temp_im);
}
int main(){
    int rowA = 3;
    int colA = 2;
    float A_re[100] = {1, 2, 4, 0, 3, 6};
    float A_im[100] = {1, 2, 4, 0, 3, 6};
    matrix_transpose_avx(A_re, A_im, rowA, colA);
    return 0;
}

I hope someone can tell me how to modify the code, so the function can transpose any matrix sucessfully and no need to specify the array size of A(for example 100) before performing the transpose operation. Without altering the function's defined parameters when defining the function.

I really need Proficient coder's help. Thanks in advance.

That doesn't look like a correct transpose for a 100x100 matrix either. You're only ever copying contiguous 64-byte chunks of data, but that contains 16 floats that need to end up in different rows. Also, on the last iteration, you'll store to `matA_im[i * colA + j + 0..15]` since it's a 64-byte store. That will go past the end of the array, since your loop bounds have `i=99` and `j=99` at that point, so the first element of your vector store is the last element of the whole matrix. — Peter Cordes, Aug 16 '23 at 03:35
You do want to start by copying around big chunks of data, but you'll need some shuffles (and maybe blends or merge-masking) to transpose within 16x16 chunks. — Peter Cordes, Aug 16 '23 at 03:37
@PeterCordes Thank you for your suggestions. I make modifications based on your comments. Make sure the matrix won’t go out of bounds. But matrix still can’t be transposed. When I try to tranpose a 16*16 matrix, the code will run i=0 and j=0~15. So I can't use the element-wise approach. I think the problem is related to matrix element address (e.g. &temp_re[j * rowA + i], &matA_re[i * colA + j]). My modified code and issue:https://stackoverflow.com/questions/76910289/transpose-a-matrix-of-unrestricted-size-using-avx-512 Hope you could give me with some valuable comments. Thank you very much. — rcheni, Aug 16 '23 at 21:11
Welcome to stackoverflow. Instead of re-asking very similar questions, you can [edit] your question (just for the future, I'm voting to close this question now and keep the your newest question active) — chtz, Aug 17 '23 at 07:56

Transpose a matrix of unrestricted size using AVX-512

0 Answers0

Linked