C++ Armadillo and OpenMp: Parallelization of summation of outer products - define reduction for Armadillo matrix

Question

I am trying to parallelize a for loop using OpenMP which sums over Armadillo matrices. I have the following code:

#include <armadillo>
#include <omp.h>

int main()
{

        arma::mat A = arma::randu<arma::mat>(1000,700);
        arma::mat X = arma::zeros(700,700);
        arma::rowvec point = A.row(0);

        # pragma omp parallel for shared(A) reduction(+:X)
        for(unsigned int i = 0; i < A.n_rows; i++){
                arma::rowvec diff = point - A.row(i);
                X += diff.t() * diff; // Adding the matrices to X here
        }

}

I am getting this error:

[Legendre@localhost ~]$ g++ test2.cpp -o test2 -O2 -larmadillo -fopenmp
test2.cpp: In function ‘int main()’:
test2.cpp:11:52: error: user defined reduction not found for ‘X’

I read up on defining reductions, but I haven't found examples for working with Armadillo matrices. What is the best way to define a reduction for Armadillo matrices in my case?

Henri Menke · Accepted Answer · 2017-07-09T05:45:33.077

Those reductions are only available for built-in types (double, int, etc.). Thus you have to do the reduction yourself, which is simple. Just accumulate the results for each thread in a thread-local variable and add this to the global result within a critical section.

#include <armadillo>
#include <omp.h>

int main()
{

  arma::mat A = arma::randu<arma::mat>(1000,700);
  arma::mat X = arma::zeros(700,700);
  arma::rowvec point = A.row(0);

  #pragma omp parallel shared(A)
  {
    arma::mat X_local = arma::zeros(700,700);

    #pragma omp for
    for(unsigned int i = 0; i < A.n_rows; i++)
    {
      arma::rowvec diff = point - A.row(i);
      X_local += diff.t() * diff; // Adding the matrices to X here
    }

    #pragma omp critical
    X += X_local;
  }
}

With more recent OpenMP (4.5 I think?) you can also declare a user-defined reduction for your type.

#include <armadillo>
#include <omp.h>

#pragma omp declare reduction( + : arma::mat : omp_out += omp_in ) \
  initializer( omp_priv = omp_orig )

int main()
{

  arma::mat A = arma::randu<arma::mat>(1000,700);
  arma::mat X = arma::zeros(700,700);
  arma::rowvec point = A.row(0);

  #pragma omp parallel shared(A) reduction(+:X)
  for(unsigned int i = 0; i < A.n_rows; i++)
  {
    arma::rowvec diff = point - A.row(i);
    X += diff.t() * diff; // Adding the matrices to X here
  }
}

Thanks. How come you don't need the critical clause before the addition to X_local? Also I tried the second answer, and it gave me this error (I know there is nothing wrong with the dimensions of my matrices): `error: addition: incompatible matrix dimensions: 0x0 and 700x700 terminate called after throwing an instance of 'std::logic_error' what(): addition: incompatible matrix dimensions: 0x0 and 700x700 terminate called recursively Aborted (core dumped)` — Cauchy, Jul 09 '17 at 05:47
@Cauchy (1) `X_local` is local to each thread. The directive `#pragma omp for` splits up the for loop into partitions for each thread, so no two threads write into the same `X_local`. (2) Yes, that was a mistake. I forgot the initializer and the matrix was default-initialized to 0x0. See the updated answer. — Henri Menke, Jul 09 '17 at 05:52
@Cauchy By default all variables from the outer scope enter the parallel region as shared (with the excpetion of the loop variable), so you don't have to mark anything explicitly shared. — Henri Menke, Jul 09 '17 at 08:09

C++ Armadillo and OpenMp: Parallelization of summation of outer products - define reduction for Armadillo matrix

1 Answers1

Linked