2

I have written a parallel program based on CUDA in Windows (GeForce GT 720M). I have installed the CUDA 9.0 Toolkit and the Visual Studio 2013. Everything is OK but when I compile the code and run it the output is wrong.

The program is:

#include <stdio.h>
#include "cuda_runtime.h"
#include "device_launch_parameters.h"

__global__ void square(float * d_out, float * d_in)
{
    int idx = threadIdx.x;
    float f = d_in[idx];
    d_out[idx] = 50;
}

int main(int argc, char ** argv)
{
    const int ARRAY_SIZE = 64;
    const int ARRAY_BYTES = ARRAY_SIZE * sizeof(float);

    // generate the input array on the host
    float h_in[ARRAY_SIZE];
    for (int i = 0; i < ARRAY_SIZE; i++)
    {
        h_in[i] = float(i);
    }
    float h_out[ARRAY_SIZE];

    // declare GPU memory pointers
    float * d_in;
    float * d_out;

    // allocate GPU memory
    cudaMalloc((void **) &d_in, ARRAY_BYTES);
    cudaMalloc((void **) &d_out, ARRAY_BYTES);

    // transfer the array to the GPU
   cudaMemcpy(d_in, h_in, ARRAY_BYTES, cudaMemcpyHostToDevice);

    // launch the Kernel
    square << <1, ARRAY_SIZE >> >(d_out, d_in);

    // copy back the result array to the GPU
    cudaMemcpy(h_out, d_out, ARRAY_BYTES, cudaMemcpyDeviceToHost);

    // print out the resulting array
    for (int i = 0; i < ARRAY_SIZE; i++)
    {
        printf("%f", h_out[i]);
        printf(((i % 4) != 3) ? "\t" : "\n");
    }

    // free GPU memory allocation
    cudaFree(d_in);
    cudaFree(d_out);

    getchar();
    return 0;
}

When I run it the output is: Square program output

Also, I compiled it with the nvcc square.cu but the output is the same. I have the kernel launch syntax error in the VS but I think it is not related to the output (but the image is related to another program):

enter image description here

Saeed Rahmani
  • 650
  • 1
  • 8
  • 29
  • start by adding [proper cuda error checking](https://stackoverflow.com/questions/14038589/what-is-the-canonical-way-to-check-for-errors-using-the-cuda-runtime-api) to your code. Then recompile/run and report any errors that are indicated. – Robert Crovella Dec 15 '17 at 17:29
  • @RobertCrovella Which answer should I use? – Saeed Rahmani Dec 15 '17 at 17:37
  • @RobertCrovella CUDA error: no kernel image is available for execution on the device – Saeed Rahmani Dec 15 '17 at 17:40
  • 1
    so are not compiling your for an architecture which will run on your GPU – talonmies Dec 15 '17 at 18:18
  • 2
    The GeForce GT 720M is a [GF117 (Fermi-based) GPU](https://www.notebookcheck.net/NVIDIA-GeForce-GT-720M.90247.0.html). CUDA 9 dropped support for Fermi devices. CUDA 8 is the latest CUDA toolkit that still supports Fermi GPUs. – Robert Crovella Dec 15 '17 at 18:26
  • @RobertCrovella Thanks. I saw it on the Wikipedia page. I will install the CUDA 8. – Saeed Rahmani Dec 15 '17 at 19:01

2 Answers2

1

The problem was the CUDA Toolkit version. For the GeForce GT 720M, the Compute Capability is 2.1 and it can be used by the CUDA 8.0.

Saeed Rahmani
  • 650
  • 1
  • 8
  • 29
1

Here's a table of CUDA Toolkit Version with their Compute Capabilities.

See section GPUs supported

CarpeDiemKopi
  • 316
  • 3
  • 13
  • Cannot believe that in 2020 the installation does not check your hardware to give you a warning that it does not support your hardware. This is bananas. I spend hours banging my head to the wall because I could copy data from host to device and vice versa but could not call a __global__ function. – Ali Nov 24 '20 at 04:43