matrix row sum in cuda

Question

i am trying to calculate matrix row sum in the cuda. since cuda is used for parallel processing so there is no need of looping. I have done matrix sum operation and the code is

__global__ void MatAdd(int A[][N], int B[][N], int C[][N]){
int i = threadIdx.x;
int j = threadIdx.y;

C[i][j] = A[i][j] + B[i][j];
}

but in the same case not able to convert it into matrix row sum. i tried following code

__global__ void rowSums(float* matrix, float* sums, int rows, int cols)
{
int row = blockIdx.y * blockDim.y + threadIdx.y;
int col = blockIdx.x * blockDim.x + threadIdx.x;
        if (i < N && j < M)
              sums[j] += matrix[i][j];

 }

Do you have an actual question to ask? – talonmies Apr 26 '16 at 08:51 — talonmies, Apr 26 '16 at 08:51
i want to ask how matrix row sum should be done in jcuda.. – user3804161 Apr 26 '16 at 09:25 — user3804161, Apr 26 '16 at 09:25

score 0 · Accepted Answer · answered Apr 26 '16 at 11:53

Your first code sample looks correct, as long as matrix size is small enough (than blockdim x gridDim)

For the second, your matrix needs to be a float**, since you dereference it two times. Then you need to use row and col variables, or rename them i and j.

matrix row sum in cuda

1 Answers1