I used x
& y
for calculating cells of a matrix in device.
when I used more than 32 for lenA & lenB, the breakpoint (in int x= threadIdx.x;
in device code) can't work and output isn't correct.
in host code:
int lenA=52;
int lenB=52;
dim3 threadsPerBlock(lenA, lenB);
dim3 numBlocks(lenA / threadsPerBlock.x, lenB / threadsPerBlock.y);
kernel_matrix<<<numBlocks,threadsPerBlock>>>(dev_A, dev_B);
in device code:
int x= threadIdx.x;
int y= threadIdx.y;
...