I keep segfaulting with the following code whenever I try writing to result. The purpose of the code is to convert a matrix of doubles *data to an array of __m256d's, adding column buffers of 0's if needed while simultaneously transposing. E.g.
[[0, 1, 2, 3, 4, 5]
[6, 7, 8, 9, 10, 11]]
will transpose to
[{0, 1, 2, 3}, {6, 7, 8, 9}
{4, 5, 0, 0}, {10, 11, 0, 0}]
void transpose_alter(__m256d *result, matrix *mat) {
for (int r = 0; r < mat->rows; r++) {
double *mat_data_offset = (double *) mat->data + r * mat->cols;
int c_block;
for (c_block = 0; c_block < mat->cols / 4; c_block++) {
result[c_block * mat->rows + r] = _mm256_loadu_pd(mat_data_offset + c_block * 4);
}
if (mat->cols % 4 > 0) {
double buffer[4] = {0, 0, 0, 0};
memcpy(buffer, mat_data_offset + c_block * 4, (mat->cols % 4) * sizeof(double));
result[c_block * mat->rows + r] = _mm256_loadu_pd(buffer);
}
}
}
The argument passed into result when segfaulting is
__m256d *alt = (__m256d *) malloc((mat->cols + 3) / 4 * mat->rows * sizeof(__m256d));