1

I am interested in matrix vector multiplication. I am analyzing speeds of matrix vector multiplication. One function represents a matrix as 1d array and another function represents it as a 2d array. The 2d array always is faster when I am running it. I can't figure out why.

I've tried reviewing books.

Matrix as 1d array:

void matrix_mult_vector_v2(const double* A, const double* x, double* result, int n_rows, int n_cols) {

    int row, col;
    double sum ;


    #pragma omp parallel shared(A, x, result, n_rows, n_cols) private(row, col, sum)
    {
        #pragma omp for schedule(static)
        for (row = 0; row < n_rows ; row++) {
            sum = 0.0;

            for(col = 0; col < n_cols; col++){

                int i = col + row * n_cols;

                sum += A[i] * x[col];
            }

            result[row] = sum;
        }
    }

}

Matrix as 2d array:

/*
 * Matrix multiply vector
 * result = A * x where A is a matrix and x is a vector
 */

void matrix_mult_vector(double** A, double* x, double* result, int n_rows, int n_cols)
{

    int row, col;
    double sum;

    #pragma omp parallel shared(A, x, result, n_rows, n_cols) private(row, col, sum)
    {

        //#pragma omp parallel for collapse(2)
        #pragma omp for schedule(static)
        for (row = 0; row < n_rows ; row++) {
            sum = 0.0;

            for(col = 0; col < n_cols; col++){
                //#pragma omp atomic

                sum += A[row][col] * x[col];
            }

            result[row] = sum;
        }
    }
}

No errors. Results should be A*x where A is a matrix and x is a vector.

John Lee
  • 23
  • 2
  • You ask why does an unspecified compiler optimo – tim18 May 18 '19 at 12:03
  • this may help: https://stackoverflow.com/questions/17259877/1d-or-2d-array-whats-faster – seleciii44 May 18 '19 at 12:04
  • 1
    Please present your specific performance results, show how you measure, compile, and wrap the code into a [mcve]. – Zulan May 18 '19 at 12:04
  • Optimizes away the race in one case and not the other? Or maybe it doesn't move the integer multiply out of the inner loop? Look at your vectorization reports and fix locality of sum. – tim18 May 18 '19 at 12:07

0 Answers0