I am trying to multiply matrix A (n times) with matrix B. I have used kernel for matrix multiplication and using stream to do this multiplication N times. I have 3 conditions to test consequently. My 1st condition is running successfully.
I don't know why it is showing error of "Invalid Argument" in the second condition iteration. I am guessing the I am not properly cleaning my memory. I have done my best to free all host and device variables. Also tried CUDA device reset, nothing helps. Can anyone help me debug this?
Please find the portion of my code here:
int main(){
for (int i = 0; i < 3; i++) {
for (int ind = 0; ind < itr; ind++){
cudaStreamCreate(&(stream[ind]));
}
cudaCheckErrors("cudaStreamCreate fail");
for (int ind = 0; ind < itr; ind++){
cudaMemcpyAsync(d_a[ind], h_a[ind], bytes_a, cudaMemcpyHostToDevice, stream[ind]);
}
cudaDeviceSynchronize();
for (int ind = 0; ind < itr; ind++){
// Launch our kernel
matrixMul<<<BLOCKS, THREADS, 0, stream[ind]>>>(d_a[ind], b, d_c[ind], M, K, N);
}
cudaDeviceSynchronize();
cudaCheckErrors("kernel fail");
for (int ind = 0; ind < itr; ind++){
cudaMemcpyAsync(h_c[ind], d_c[ind], bytes_c, cudaMemcpyDeviceToHost, stream[ind]);
}
for (int ind = 0; ind < itr; ind++){
cudaStreamSynchronize(stream[ind]);
}
cudaEventRecord( stop, 0 );
cudaEventSynchronize( stop );
cudaEventDestroy( start );
cudaEventDestroy( stop);
// Free allocated memory ****The issue was here.******
cudaFreeHost(h_a);
cudaFree(b);
cudaFreeHost(h_c);
cudaFree(d_a);
cudaFree(d_c);
cudaDeviceReset();
}
return 0;
}
In second iteration I was getting error as:
Fatal error: cudaStreamCreate fail (invalid argument at /tmp/tmpwgpzgk9m/73a7502c-7662-4e80-804e-4debff15dc45.cu:140)
*** FAILED - ABORTING
SOlved:
The error was coming due to memory leakage. I was allocating the array pointers but was only freeing 1st one. As per suggestion from below answer from Robert, the memory should be for each index of the array. And also please always use proper error in cuda like this
.