0

I'm newbie in cuda programmation. I have a problem with this code (it was written by my teacher):

#include <stdio.h>

#define THREAD_PER_BLOCK 128

__global__ void add(int *a,const int N){
    int index=threadIdx.x+blockIdx.x*blockDim.x;
    if (index<N)
      a[index] = a[index]+10;
}
int main( void ){
    int *a;
    // managed
    int i;
    int N=1024;
    int size = N * sizeof( int );

    cudaMallocManaged( &a, size );
    for(i=0; i<N; i++) {
        a[i]=i;
    }

    add<<< N/THREAD_PER_BLOCK, THREAD_PER_BLOCK >>>( a,N);
    cudaDeviceSynchronize();
    for (int i=0; i<10; i++){
        printf("%d %d\n", i, a[i]);
    }
    cudaFree( a );
    return 0;
}

I've detected a seg fault on fist for-loop, I have no idea of why the program crashes. My operative system is Ubuntu 14.04 and this is the output of querydevice:

Detected 1 CUDA Capable device(s)

Device 0: "GeForce 820M"
 CUDA Driver Version / Runtime Version          8.0 / 8.0
 CUDA Capability Major/Minor version number:    2.1
 Total amount of global memory:                 1985 MBytes (2081095680 bytes)
  ( 2) Multiprocessors, ( 48) CUDA Cores/MP:     96 CUDA Cores
  GPU Max Clock rate:                            1550 MHz (1.55 GHz)
  Memory Clock rate:                             900 Mhz
  Memory Bus Width:                              64-bit
  L2 Cache Size:                                 131072 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65535), 3D=(2048, 2048, 2048)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 32768
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1536
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (65535, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime  Version = 8.0, NumDevs = 1, Device0 = GeForce 820M
Result = PASS
talonmies
  • 70,661
  • 34
  • 192
  • 269
alukard990
  • 811
  • 2
  • 9
  • 14

1 Answers1

3

The problem here is that your GPU is a Fermi GPU (compute capability 2.x):

Device 0: "GeForce 820M"
 ...
 CUDA Capability Major/Minor version number:    2.1
                                                ^^^

and unified memory (for cudaMallocManaged) requires a compute capability 3.0 or higher GPU.

Any time you are having trouble with a CUDA code, it's good practice to use proper CUDA error checking before asking others for help. Even if you don't understand the error output, it will be useful to others trying to help you. In this case you would have gotten a concise error message that says that the cudaMallocManaged function is not supported.

Community
  • 1
  • 1
Robert Crovella
  • 143,785
  • 11
  • 213
  • 257