Cuda matrix multiplication- wrong result

Question

this is my code for matrix multiplication, but when i run it i get correct result for first row but wrong ones for second and third(mostly big negative numbers). This is my first programm so i used some code that i found on net

 #include <iostream>

__global__ void MnozenjeMatrica(int* d_c, int* d_a, int* d_b)
{
int row = blockIdx.y * blockDim.y + threadIdx.y;
int col = blockIdx.x * blockDim.x + threadIdx.x;    

int d = 0;
for(int i=0; i<3; i++)
{
    int x = d_a[row * 3 + i];
    int y = d_b[i * 3 + col];
    d += x * y;
}

d_c[row * 3 + col] = d; 
}

int main()
{
const int SIZE = 9 * sizeof(int); 

int a[3][3] = {{2, 4, 6}, {1, 3, 5}, {8, 4, 1}};
int b[3][3] = {{5, 8, 34}, {5, 7, 5}, {1, 4, 31}};
int c[3][3] = {{5, 8, 34}, {5, 7, 5}, {1, 4, 31}};

int* d_a;
int* d_b;
int* d_c;

cudaMalloc((void**) &d_a, SIZE);
cudaMalloc((void**) &d_b, SIZE);
cudaMalloc((void**) &d_c, SIZE);

cudaMemcpy(d_a, a, SIZE, cudaMemcpyHostToDevice);
cudaMemcpy(d_b, b, SIZE, cudaMemcpyHostToDevice);

MnozenjeMatrica<<<3, 3>>>(d_c, d_a, d_b);
cudaMemcpy(c, d_c, SIZE, cudaMemcpyDeviceToHost);

for(int i=0; i<3; i++)
{
    for(int j=0;  j<3; j++)
    {
        printf("%d\t", c[i][j]);
    }
    printf("\n");
}


 }

well i need to resolve the error to get all results correct :) — Bruno Brunolav, May 30 '13 at 12:57
And I need a haircut and a sandwich. That doesn't mean I have a valid Stack Overflow question. And neither, it seems, do you. Questions here are intended to be of use to others who will come afterwards. "My code doesn't work, please help me fix it" rarely falls into that category. — talonmies, May 30 '13 at 13:00

score 2 · Answer 1 · edited May 23 '17 at 12:28

Completely agree with @talonmies.

More suggestions:

There are plenty of people who have posted questions about cuda matrix multiplication, you might take a look at some of those to get some ideas.
You're not doing any cuda error checking on kernel calls and cuda calls (but it's recommended)
You might try running your code with cuda-memcheck, and see what it says.
You could debug this kernel pretty quickly with a few choice printf statements. This is mostly C code after all, you should consider using basic C troubleshooting techniques.

Since I was able to quickly spot this, I can tell you that your kernel is depending on a 2-D threadblock structure to do anything useful:

int row = blockIdx.y * blockDim.y + threadIdx.y;
int col = blockIdx.x * blockDim.x + threadIdx.x;

But you are launching a 1D grid of 1D threadblocks:

MnozenjeMatrica<<<3, 3>>>(d_c, d_a, d_b);
                  ^  ^
                  |  1-D threadblock (3 threads)
                  1-D grid (3 blocks)

So I'm not surprised it only works for a single row.

Cuda matrix multiplication- wrong result

1 Answers1