System get stuck on running matrix multiplication using CUDA

Question

When i'm running this code on my system, after some seconds my system get stuck and i have to restart system again. So my question is what's i'm doing wrong here? Any suggestion will appreciated.

__global__ void matMul(float* d_M, float* d_N, float* d_P, int width) {
int row = blockIdx.y*width + threadIdx.y;
int col = blockIdx.x*width + threadIdx.x;

if (row < width && col < width) {
    float product_val = 0;
        for (int k = 0; k < width; k++) {
            product_val += d_M[row*width + k] * d_N[k*width + col];
        }
    d_P[row*width + col] = product_val;
 }
}


int main() {
const int n = 9;
float* d_M;
float* d_N;
float* d_P;

cudaMallocManaged(&d_M, SIZE * sizeof(float));
cudaMallocManaged(&d_N, SIZE * sizeof(float));
cudaMallocManaged(&d_P, SIZE * sizeof(float));

for (int i = 0; i < n; ++i) {
    d_P[i] = 0;
}

int count = 0;
for (int i = 0; i < n; ++i) {
    d_N[i] = ++count;
}

count = 0;
for (int i = 0; i < n; ++i) {
    d_M[i] = ++count;
}

matMul <<<1, n>>> (d_M, d_N, d_P, 3);
cudaDeviceSynchronize();

for (int i = 0; i < n; ++i) {
    printf("%f\n", d_P[i]);
}
cudaFree(d_N);
cudaFree(d_M);
cudaFree(d_P);
return 0;

}

The code you have posted won't compile, so it is pretty hard to say what might be going wrong. I can see several mistakes, but the behaviour you describe is more likely to be something wrong with your CUDA installation than with your code. If you want a useful answer, please read everything at [MCVE] and edit your question accordingly — talonmies, Oct 07 '18 at 13:08
@talonmies Actually I'm not getting any kind of errors here because I have written one program which is vector-vector addition that worked perfectly. So I'm pretty sure that there is no any problem in CUDA installation. — Ajay Kumar, Oct 07 '18 at 13:17
The code you posted (when SIZE is defined and an include added) compiles and runs without error (although there are mistakes which means the results are not correct). There is no error in the code which will cause the symptoms you describe. — talonmies, Oct 07 '18 at 17:12
I just ran your code 1000 times in a shell loop with cuda-memcheck. No erriors, sorry — talonmies, Oct 07 '18 at 17:35

score -1 · Accepted Answer · answered Oct 07 '18 at 10:15

Assuming that when you mean your system gets stuck, you get some kind of error in your program, it's likely that you're accessing memory that is invalid.

This could be in the higher indexes of your d_M and d_N iterations when k + row*width is indexing beyond the size of memory that you've allocated in cudaMallocManaged.

It's always good practice in situations like these to add some error handling using commands such as cudaPeekatLastError().

This link might be helpful for implementing some debugging.

Given that the code doesn't actually contain any of the defects you have speculated on (I know this because I ran the code), how did this actually answer the question? — talonmies, Oct 11 '18 at 06:45

System get stuck on running matrix multiplication using CUDA

1 Answers1