I have a question about my code and whether I can run it on my current device or not. Basically, I want to do a 3D interpolation. When I launch my interpolation kernel, I get the following error: kernel failure: invalid configuration argument
I saw in this discussion that it can happen if you call too many threads or blocks, but I am not sure it is the case in my code. Could someone have a look at it and tell me what's wrong?
Here is how I call my kernel:
dim3 blockSize(6,6,6);
dim3 threadSize(dimX/blockSize.x,dimY/blockSize.y,dimZ/blockSize.z);
d_interpolate_kernel<<<blockSize,threadSize>>>(output,dimX,dimY,dimZ);
My dimensions are dimX = 54 or 108, dimY=dimX=42 or 84. So I have blockSize(6,6,6) and threadSize(9,7,7) or (18,14,14).
My card has the following capabilities:
MAX_BLOCK_DIM_X = 512
MAX_BLOCK_DIM_Y = 512
MAX_BLOCK_DIM_Z = 64
MAX_GRID_DIM_X = 65535
MAX_GRID_DIM_Y = 65535
MAX_GRID_DIM_Z = 1
Do I get the error because MAX_GRID_DIM_Z is 1? If yes, is there a way around this?
Thank you!