CUDA: Fill matrix with results of summation

Question

I need to fill a matrix with values returned from function below

__device__ float calc(float *ar, int m, float sum, int i, int j)
{
int idx = blockIdx.x * blockDim.x + threadIdx.x;
if (idx < m)
  {

    ar[idx] = __powf(ar[idx], i + j);
    atomicAdd(&sum, ar[idx]);
  }
return sum;
}

Matrix set up as one dimensional array and fills up through this function

__global__ void createMatrix(float *A, float *arr, int size)
{
    A[threadIdx.y*size + threadIdx.x] = /*some number*/;
}

In theory it should be something like this

__global__ void createMatrix(float *A, float *arr, int size)
{
    float sum = 0;
    A[threadIdx.y*size + threadIdx.x] = calc(arr, size, sum, threadIdx.x, threadIdx.y);
}

but it doesn't work that way, calc always returns 0. Is there any way I can fill matrix using global function? Thanks in advance.

einpoklum · Answer 1 · 2019-01-20T16:38:07.510

1

You're passing sum by value rather than by reference. So all of your atomicAdd()'s have no effect on the zero-initialized value in the kernel.

However, even if you were to pass it by reference, this would still be a poorly-designed kernel. You see, you don't need the atomics if you have a per-thread sum variable (which you do). Also, your calc() function only adds a value once to each sum value, while it seems you expect it to add more than once.

edited Jan 20 '19 at 16:38

answered Jan 20 '19 at 16:32

einpoklum

118,144
57
340
684

I see, but how should look proper `calc()` function then? There has to be a reduction off `arr` array after mapping it with `__powf(ar[idx], i + j);`. I'm not sure how to implement this. – Agent_0f_things Jan 20 '19 at 17:04
1

Just use one of the reductions available in libraries like `cub` or `thrust`, but with an input iterator which applies the power function before passing on the value. If you want a more detailed explanation, read up on these libraries reduction functionality, and if that doesn't help - ask a separate question. – einpoklum Jan 20 '19 at 23:06

CUDA: Fill matrix with results of summation

1 Answers1