Here is my code:
int threadNum = BLOCKDIM/8;
dim3 dimBlock(threadNum,threadNum);
int blocks1 = nWidth/threadNum + (nWidth%threadNum == 0 ? 0 : 1);
int blocks2 = nHeight/threadNum + (nHeight%threadNum == 0 ? 0 : 1);
dim3 dimGrid;
dimGrid.x = blocks1;
dimGrid.y = blocks2;
// dim3 numThreads2(BLOCKDIM);
// dim3 numBlocks2(numPixels/BLOCKDIM + (numPixels%BLOCKDIM == 0 ? 0 : 1) );
perform_scaling<<<dimGrid,dimBlock>>>(imageDevice,imageDevice_new,min,max,nWidth, nHeight);
cudaError_t err = cudaGetLastError();
cudasafe(err,"Kernel2");
This is the execution of my second kernel and it is fully independent in term of the usage of data. BLOCKDIM
is 512 , nWidth and nHeight
are 512 too and cudasafe
simply prints the corresponding string message of the error code. This section of the code gives configuration error just after the kernel call.
What might give this error, any idea?