0

like my last question said(How do I deal with a data race in OpenMP?) there are three solution to do an aggregation . like @wolfpack88's answer ,but the perfromance of the three solutions are differernt, the reduction is twice as fast as the others.

so my question is why it happend and how can I use the other critical and automic to get the same performance.

Community
  • 1
  • 1
YOung
  • 938
  • 2
  • 10
  • 14
  • `reduction` can be parallelized to a certain degree (using a binary tree), while the others can't, they are executed sequentially. Which is exactly what `critical` and `atomic` are designed for, you can't change that easily. – Alexander Dec 08 '14 at 15:38

1 Answers1

0

When we use the reduction clause, the compiler creates a private copy of the variables specified in the reduction(operator:list) and at the end of execution of all threads, the operator specified in reduction is applied on each of the private copy into one global copy. Hence the threads do not need to wait for acquiring locks so that they can write to the variables, which gives better performance as compared to using atomic or critical in which each thread waits to acquire a lock

orak
  • 2,399
  • 7
  • 29
  • 55