0

Consider the following CUDA program, in a file named foo.cu:

#include <cooperative_groups.h>
#include <stdio.h>

__global__ void my_kernel() {
    auto g = cooperative_groups::this_grid();
    g.sync();
}

int main(int, char **) {
    cudaLaunchCooperativeKernel( (const void*) my_kernel, 2, 2, nullptr, 0, nullptr);
    cudaDeviceSynchronize();
}

This program doesn't do much - but it's a valid program (if your compute capability is high enough to support the entire grid as a cooperative group). It should compile link and run. However, I get this:

$ nvcc -o foo  -gencode arch=compute_61,code=sm_61 foo.cu 
ptxas fatal   : Unresolved extern function 'cudaCGGetIntrinsicHandle'

Surprising! It doesn't help if I add some specific -l and -L flags, e.g.:

$ nvcc -o foo  -gencode arch=compute_61,code=sm_61 foo.cu -L"/usr/lib/x86_64-linux-gnu/" \
-L"/usr/lib/x86_64-linux-gnu/stubs"  -lcudadevrt -lcudart_static -lrt -lpthread -ldl
ptxas fatal   : Unresolved extern function 'cudaCGGetIntrinsicHandle'

Why is this happening? And how should I modify the nvcc command-line to make it find that unresolved symbol?

Notes:

  • I'm using Devuan GNU/Linux 3.0.
  • CUDA 10.1 is installed as a distribution package, so that its libraries are under /usr/lib/x86_64-linux-gnu.
  • An x86_64 machine with a GeForce 1050 Ti card.
einpoklum
  • 118,144
  • 57
  • 340
  • 684
  • 1
    cooperative groups with grid sync requires relocatable device code linking. If you look at any of the CUDA sample codes that do this, you'll find a properly constructed Makefile which indicates how to build. So add `-rdc=true` to your compile command line. – Robert Crovella Dec 25 '19 at 20:14
  • @RobertCrovella: Adding `-rdc=true` doesn't solve the problem, but adding `-rdc=true` _and_ all those `-L` and `-l`s does. – einpoklum Dec 25 '19 at 20:25
  • @RobertCrovella: See [followup question](https://stackoverflow.com/questions/59481689/why-does-my-nvcc-refuse-to-look-into-usr-lib-x86-64-linux-gnu?noredirect=1&lq=1). – einpoklum Dec 25 '19 at 20:38
  • In a supported development environment, according to my testing, the only thing required to compile this code is: `nvcc -arch=sm_60 -o t1 t1.cu -rdc=true -lcudadevrt` (the `arch` switch can be changed to any arch that supports grid sync, which should be cc6.0 or higher) – Robert Crovella Dec 25 '19 at 21:25

0 Answers0