0

I keep segfaulting with the following code whenever I try writing to result. The purpose of the code is to convert a matrix of doubles *data to an array of __m256d's, adding column buffers of 0's if needed while simultaneously transposing. E.g.

[[0, 1, 2, 3, 4, 5]
 [6, 7, 8, 9, 10, 11]]

will transpose to

[{0, 1, 2, 3}, {6, 7, 8, 9}
 {4, 5, 0, 0}, {10, 11, 0, 0}]
void transpose_alter(__m256d *result, matrix *mat) {
    for (int r = 0; r < mat->rows; r++) {
        double *mat_data_offset = (double *) mat->data + r * mat->cols;
        int c_block;
        for (c_block = 0; c_block < mat->cols / 4; c_block++) {
            result[c_block * mat->rows + r] = _mm256_loadu_pd(mat_data_offset + c_block * 4);
        }
        if (mat->cols % 4 > 0) {
            double buffer[4] = {0, 0, 0, 0};
            memcpy(buffer, mat_data_offset + c_block * 4, (mat->cols % 4) * sizeof(double));
            result[c_block * mat->rows + r] = _mm256_loadu_pd(buffer);
        }
    }
}

The argument passed into result when segfaulting is

__m256d *alt = (__m256d *) malloc((mat->cols + 3) / 4 * mat->rows * sizeof(__m256d));
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Wentinn Liao
  • 11
  • 1
  • 2
  • What's line segfaults? Please post a [mre] instead of a snippet. Do you check that malloc didn't return NULL? – Allan Wind Nov 22 '21 at 23:23
  • The memory returned by `malloc` is not guaranteed to be 32byte aligned, but casting to `(__m25d*)` makes the compiler think it was. And then it may chose to store data to `result` using aligned stores. – chtz Nov 23 '21 at 01:23
  • Is there a way to force memory alignment? – Wentinn Liao Nov 23 '21 at 05:22

0 Answers0