6

I'm a newbie looking for help with linking some compiled CUDA object code to a C++ project using g++. There have been some previous questions and solutions for this posted (here and here), but none have worked for me yet and I can't seem to figure out why. Unfortunately, I'm stuck using Windows for this.

The simple example that I'm trying to get working looks like this:

// kernel.h
int cuda_vec_add(float *h_a, float *h_b, float *h_c, int n);

CUDA code adding two vectors.

// kernel.cu
#include <kernel.h>

__global__ void vec_add_kernel(float *a, float *b, float *c, int n) {
    int i = threadIdx.x + blockDim.x * blockIdx.x;
    if (i < n) c[i] = a[i] + b[i];
}

int cuda_vec_add(float *h_a, float *h_b, float *h_c, int n) {
    float *d_a, *d_b, *d_c;

    cudaMalloc(&d_a, n*sizeof(float));
    cudaMalloc(&d_b, n*sizeof(float));
    cudaMalloc(&d_c, n*sizeof(float));

    cudaMemcpy(d_a, h_a, n*sizeof(float), cudaMemcpyHostToDevice);
    cudaMemcpy(d_b, h_b, n*sizeof(float), cudaMemcpyHostToDevice);

    vec_add_kernel<< <(n-1)/256+1,256>> >(d_a, d_b, d_c, n);

    cudaMemcpy(h_c, d_c, n*sizeof(float), cudaMemcpyDeviceToHost);

    cudaFree(d_a); cudaFree(d_b); cudaFree(d_c);

    return 0;
}

And c++ code calling the CUDA function.

// main.cpp
#include <stdlib.h>
#include <stdio.h>
#include <iostream>
#include <kernel.h>

using namespace std;


int main() {
    const int n = 5;
    float h_A[n] = { 0., 1., 2., 3., 4. };
    float h_B[n] = { 5., 4., 3., 2., 1. };
    float h_C[n];

    cuda_vec_add(h_A, h_B, h_C, n);

    printf("{ 0.0, 1.0, 2.0, 3.0, 4.0 } + { 5.0, 4.0, 3.0, 2.0, 1.0 } = { %0.01f, %0.01f, %0.01f, %0.01f, %0.01f }\n",
        h_C[0], h_C[1], h_C[2], h_C[3], h_C[4]);

    cin.get();

    return 0;
}

I first compiled the CUDA code to "kernel.o" using nvcc:

nvcc -I. -arch=sm_30 -c kernel.cu -o kernel.o

This seems to work fine. But then when I try to link it to my C++ project:

g++ -I. -L"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\lib\x64" main.cpp kernel.o -lcuda -lcudart

I get the following error:

Warning: corrupt .drectve at end of def file
C:\Users\Geoff\AppData\Local\Temp\cczu0qxj.o:main.cpp:(.text+0xbe):
undefined reference to `cuda_vec_add(float*, float*, float*, int)'
collect2.exe: error: ld returned 1 exit status

I'm using CUDA toolkit 7.5 with Visual Studio 2013 and gcc version 5.2.0.

So far I've tried:

  • Compiling everything with nvcc. This works fine except it doesn't fit the requirements of my project.

  • The solution posted here using the -dlink flag in nvcc. Unfortunately, this returned the same error.

  • Some other, less productive things.

Really sorry if this ends up being a dumb mistake, but I've been stuck on it for a while. Thanks for your help.

Community
  • 1
  • 1
Geoff M
  • 91
  • 1
  • 1
  • 2
  • To understand if this is a name mangling problem, can you run `nm` on the `kernel.o` to see how the `cuda_vec_add` function looks like? Also the warning about the corruption seems weird. – Rudolfs Bundulis Mar 09 '16 at 08:47
  • You can't use g++ or GNU tools in general on a Windows platform with CUDA It's an unsupported configuration Refer to the windows installation guide for supported compilers and configs. Basically you have to use visual studio and the MS C++ compiler cl.exe – Robert Crovella Mar 09 '16 at 10:15
  • 1
    It looks like Robert is correct that this is a problem with using g++ with CUDA in Windows. I just tried it on a linux box and got no errors. Thanks for your help. – Geoff M Mar 09 '16 at 18:48

1 Answers1

2

If the issue really is the name mangling differences between g++ anc cl which cause the fact that g++ simply does not see the function, try defining it inside an extern "C" {} block to force C linkage. That may help.

EDIT

So I tried doing the same and I was able to link successfully. Let me post what I did and I hope this helps you.

So what I have on my system is CUDA toolkit 7.5 and mingw x64 (gcc 4.5.4).

I put your code in three files as described by you - kernel.cu, main.cpp and kernel.h, and I changed kernel.h to

#pragma once

extern "C" 
{
int cuda_vec_add(float *h_a, float *h_b, float *h_c, int n);
}

Then I did

nvcc kernel.cu -c -o kernel.obj
g++.exe -c main.cpp -o main.obj
g++.exe  main.obj kernel.obj "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\lib\Win32\cuda.lib" "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\lib\Win32\cudart.lib" -o main.exe

And hapilly got main.exe. The curious thing was that nvcc automatically invoked 64bit compiler - maybe that is the issue for you if you are using a 32bit g++ and nvcc makes a 64bit object file, but that depends on the configuration of your system.

I also used objdump to dump the kernel.obj file and my entry for cuda_vec_add looks like this:

[302](sec  5)(fl 0x00)(ty  20)(scl   2) (nx 0) 0x0000000000000050 cuda_vec_add

There is no name mangling if compared to your entry. Did you properly apply the extern "C" {} block?

And also double check the bitness. I initially linked against x86 libs and g++ only complained about not being to resolve cudaMemcpy but nothing about the fact that the libs were incompatible. objdump can help you with this - it prints the architecture int the first line. For example when I ran objdump kernel.obj -t the first line was

kernel.obj:     file format pe-x86-64

so you can inspect both your object files if they do match.

Rudolfs Bundulis
  • 11,636
  • 6
  • 33
  • 71
  • It seems that the problem wasn't caused by name mangling. Adding an extern "C" {} block around the host code in kernel.cu didn't change the error. It looks like Robert was correct that it's a Windows-specific issue with the tools I was using. Thanks for your help. – Geoff M Mar 09 '16 at 18:50
  • @GeoffM well Robert is correct that Visual Studio is the officially supported toolkit however the main issues that would arise when mixing g++ and msvc is name mangling. Can you just out of curiosity dump the symbols from the kernel object file to simply see what is inside it? – Rudolfs Bundulis Mar 09 '16 at 19:13
  • Ok, sure. I don't know enough to understand the output of `nm` on `kernel.o`, but I've pasted the full output here - http://pastebin.com/8Dj7j8iU. The lines that reference the "cuda_vec_add" function look like this - `000000000000000c p $pdata$?cuda_vec_add@@YAHPEAM00H@Z` – Geoff M Mar 09 '16 at 19:59
  • 1
    @GeoffM in spite to the person who downvoted (yeah, I know I'm advertising unsupported mechanisms, but hey - if this person has a requirement to use g++ and he can't do anything else - this is why we are here to help), I tried doing this myself and succeeded and I'll edit my post, hope this is helpful. – Rudolfs Bundulis Mar 10 '16 at 22:10
  • And I would really like to know the reason for the downvote, if the person who downvoted looks at this. – Rudolfs Bundulis Mar 10 '16 at 22:22
  • Hi Rudolfs, thanks for following up. Your solution works for me as well! It looks like I didn't apply the `extern "C" {}` block properly, although I need to play around with it some more to make sure. I tried to do it inside the `kernel.cu` code instead of the header file. Thanks for your help! Not sure why you got downvoted - I don't have enough rep yet to upvote you back up apparently. – Geoff M Mar 12 '16 at 01:02
  • Again - if anyone down votes can the person leave at least a minimal feedback? – Rudolfs Bundulis May 30 '16 at 20:15