Creating a static CUDA library to be linked with a C++ program

Question

I am attempting to link a CUDA kernel with a C++ autotools project however cannot seem to pass the linking stage.

I have a file GPUFloydWarshall.cu that contains the kernel and a wrapper C function that I would like place into a library libgpu.a. This will be consistent with the remainder of the project. Is this at all possible?

Secondly, the library would then need to be linked to around ten other libraries for the main executable which at the moment using mpicxx.

Currently I am using/generating the below commands to compile and create the libgpu.a library

nvcc   -rdc=true -c -o temp.o GPUFloydWarshall.cu
nvcc -dlink -o GPUFloydWarshall.o temp.o -L/usr/local/cuda/lib64 -lcuda -lcudart
rm -f libgpu.a
ar cru libgpu.a GPUFloydWarshall.o
ranlib libgpu.a

When this is all linked into the main executable I get the following error

problem/libproblem.a(libproblem_a-UTRP.o): In function `UTRP::evaluate(Solution&)':
UTRP.cpp:(.text+0x1220): undefined reference to `gpu_fw(double*, int)'

Th gpu_fw function is my wrapper function.

score 8 · Accepted Answer · edited May 23 '17 at 11:54

Is this at all possible?

Yes, it's possible. And creating a (non-CUDA) wrapper function around it makes it even easier. You can make your life easier still if you rely on C++ linking throughout (you mention a wrapper C function). mpicxx is a C++ compiler/linker alias, and cuda files (.cu) follow C++ compiler/linker behavior by default. Here's a very simple question that discusses building cuda code (encapsulated in a wrapper function) into a static library.

Secondly, the library would then need to be linked to around ten other libraries for the main executable which at the moment using mpicxx.

Once you have a C/C++ (non-CUDA) wrapper exposed in your library, linking should be no different than ordinary linking of ordinary libraries. You may still need to pass the cuda runtime libraries and any other cuda libraries you may be using in the link step, but this is the same conceptually as any other libraries your project may depend on.

EDIT:

It's not clear you need to use device linking for what you want to do. (But it's acceptable, it just complicates things a bit.) Anyway, your construction of the library is not quite correct, now that you have shown the command sequence. The device link command produces a device-linkable object, that does not include all necessary host pieces. To get everything in one place, we want to add both GPUFloydWarshall.o (which has the device-linked pieces) AND temp.o (which has the host code pieces) to the library.

Here's a fully worked example:

$ cat GPUFloydWarshall.cu
#include <stdio.h>

__global__ void mykernel(){
  printf("hello\n");
}

void gpu_fw(){
  mykernel<<<1,1>>>();
  cudaDeviceSynchronize();
}


$ cat main.cpp
#include <stdio.h>

void gpu_fw();

int main(){

  gpu_fw();
}

$ nvcc   -rdc=true -c -o temp.o GPUFloydWarshall.cu
$ nvcc -dlink -o GPUFloydWarshall.o temp.o -lcudart
$ rm -f libgpu.a
$ ar cru libgpu.a GPUFloydWarshall.o temp.o
$ ranlib libgpu.a
$ g++ main.cpp -L. -lgpu -o main -L/usr/local/cuda/lib64 -lcudart
$ ./main
hello
$

I'm not quite sure I understand mainly due to autotools generating the library for me. I have edited my original question to include extra details that should hopefully make things a little clearer. — Matt John, Nov 12 '14 at 18:53
Following this approach I now can compile the code successfully. When executing however the following error is produced. "Error invalid device function at line 84 in file GPUFloydWarshall.cu". Does this mean the kernel is not being compiled? I know the kernel is correct as I have tested it on it's own externally. — Matt John, Nov 12 '14 at 19:27
SO is not really a chat room. You have a different problem now. It's recommended that you post a new question. The kernel is compiled, but the architecture/target does not match the GPU you are running it on. Would need other specifics like actual compile command, GPU you are running on, CUDA version, etc. — Robert Crovella, Nov 12 '14 at 19:42
CUDA 6.5 defaults to compiling for a cc2.0 device if no `-arch` switch is specified (ie. the default is `-arch=sm_20`). If you compile as shown above, and then attempt to run on a cc1.x device, you would probably get the error message you indicate ("invalid device function") — Robert Crovella, Nov 12 '14 at 20:32

Creating a static CUDA library to be linked with a C++ program

1 Answers1

Linked