2

I know I can do the parallel reduction to sum up the elements of an array in parallel. But it is a little difficult for me to follow it. I saw that in cublas, there is this function called cublasDasum that sums up the absolute values of the elements. It seems there should be a very similar function that sums up the elements, not the absolute values. Is there any way I can find the source code of cublasDasum and see how this is done?

Adding up an array is such a basic operation. I can't believe that there is no such a function that does it ... .

Farzad
  • 3,288
  • 2
  • 29
  • 53
codeCcode
  • 43
  • 1
  • 7
  • Sum of vector values on GPU is likely useless because of the bottleneck of transferring data to the GPU: http://stackoverflow.com/questions/15194798/vector-step-addition-slower-on-cuda You need more computational intensity per bytes (e.g. matrix multiplication) to see any speedup. – Ciro Santilli OurBigBook.com May 10 '16 at 19:25

2 Answers2

2

Take a look at the answers here for some good ideas. Thrust has pretty easy to use reduction operations.

You can sum all the elements of a matrix by treating it as a 1 x N array, creating an N x 1 array of ones, and doing a cublasDgemm operation.

I don't think you're going to find the source code for cublas anywhere.

Community
  • 1
  • 1
Robert Crovella
  • 143,785
  • 11
  • 213
  • 257
-1

You can use cublasDaxpy ( AXPY blas equivalent) with alpha = 1 which performs:

y = alpha.x + y

And if you work on matrix, you can use cublasDgeam ( no BLAS equivalent )

  • axpy takes two vectors and produces a vector result. The OP wants to take a single vector (matrix) and produce a scalar result. – Robert Crovella Feb 27 '14 at 02:38