3

I am attempting dynamic parallelism on a GTX 980 ti card. All attempts at running code return "unknown error". Simple code is shown below with compilation options.

I can execute kernels at depth=0 with no issues. The first time a child is called, the error is given. The cudaDeviceSynchronize() were included after looking at other questions here, but didn't solve problem.

Any ideas? Could this be a drivers issue?

Edit 1:

OS: Linux-x86_64

Nvidia driver version: 384.59

nvcc version 7.5.17

There are two 980 ti's connected with PCIe x16 Gen3. The system also has windows installed on another RAID configured SSD.

#include <cuda.h>
#include <fstream>
#include <stdio.h>
#include <stdlib.h>

__global__ void ker_two(){
int two=0;
two++;
}

__global__ void ker_one(){
int one=0;
one++;
ker_two<<<1,1>>>();
cudaDeviceSynchronize();
};

int main( ){

ker_one<<<1,1>>>();
cudaDeviceSynchronize();

cudaError_t err = cudaGetLastError();
if (err != cudaSuccess) 
    printf("Cuda Error: %s\n", cudaGetErrorString(err));//*/

return 0;
}

compiled with

nvcc -arch=compute_52 -rdc=true -lcudadevrt test.cu
AshleyG
  • 31
  • 2
  • 4
    I don't have any trouble with your code and compile command. You don't say anything about your environment (OS, driver version, CUDA version) . – Robert Crovella Jul 30 '17 at 19:38
  • "I am attempting dynamic parallelism" - Frankly? Don't bother. The way it is now it's almost never worth it, if at all. – einpoklum Jul 31 '17 at 20:45

1 Answers1

1

I am able (?) to reproduce the error on a machine with a Maxwell Titan card. It's a Fedora 24 distribution with CUDA 8.0.61 installed manually. Driver version is 375.51.

However - it seems the problem only occurs on my system when I call the cudaDeviceSynchronize() within the ker_one(), regardless of whether I call the second kernel or not. So maybe that's the problem you're seeing rather than dynamic parallelism per se.

Considering @talonmies' comment, this might even be just a driver issue.

einpoklum
  • 118,144
  • 57
  • 340
  • 684
  • To counter that, I have compiled and run the MCVE code under both CUDA 7.5 and CUDA 8 with the 367.48 driver and a Maxwell card and cannot reproduce the error – talonmies Aug 02 '17 at 06:41