0

The following code does not work. My expectation is all the y[i] have 3 after the kernel function add() is called. But if N >= (1 << 24) - 255, all the y[i]'s are 2 (as if the kernel function add() did not run).

#include <iostream>
__global__ void add(int n, int *x, int *y) {
    int index = blockIdx.x * blockDim.x + threadIdx.x;
    int stride = blockDim.x * gridDim.x;
    for (int i = index; i < n; i += stride) y[i] = x[i] + y[i];
}
int main() {
    int *x, *y, N = (1 << 24) - 255; // 255 wrong / 256 ok
    cudaMallocManaged(&x, N * sizeof(int));
    cudaMallocManaged(&y, N * sizeof(int));
    for (int i = 0; i < N; ++i) {x[i] = 1; y[i] = 2;}
    int sz = 256;
    dim3 blockDim(sz,1,1);
    dim3 gridDim((N+sz-1)/sz,1,1);
    add<<<gridDim, blockDim>>>(N, x, y);
    cudaDeviceSynchronize();
    for (int i = 0; i < N; ++i) if (y[i]!=3) std::cout << "error" << std::endl;
    cudaFree(x);
    cudaFree(y);
    return 0;
}

The GPU is a GTX1080Ti and has the following limits:

Maximum number of threads per block:           1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)

Machine is X86_64 Linux Ubuntu 16.04. Am I doing something wrong here? Please help.

Jihyun
  • 883
  • 5
  • 17
  • 1
    [proper CUDA error checking](https://stackoverflow.com/questions/14038589/what-is-the-canonical-way-to-check-for-errors-using-the-cuda-runtime-api) would help to focus your attention on the problem – Robert Crovella Jul 16 '17 at 15:50
  • Thanks! I added gpuErrchk( cudaPeekAtLastError() ); right after the kernel function add call. Then it returned "GPUassert: invalid argument test.cu 42" when I compiled it without -arch=sm_60. – Jihyun Jul 16 '17 at 17:01

1 Answers1

1

I did not specify -arch= when compiling this. So I ended up using -arch=sm_20, which is the default value. I used -arch=sm_60 and now it is working as the x dimension of the grid size is 2147483647 for computing capability 3 or above.

http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#compute-capabilities

Jihyun
  • 883
  • 5
  • 17