CUDA and Linker errors

Question

This may be a similar question to Linker errors 2005 and 1169 (multiply defined symbols) when using CUDA __device__ functions (should be inline by default), but not exactly. I'm getting several LNK2005 errors when trying to build a project (using code that has been shown to work elsewhere) on VS2010. I'm at my wits' end.

For example, I have the following three files: transposeGPU.h, transposeGPU.cu, and transposeCUDA.cu. transposeGPU.h can be summarized as follows:

void transposeGPU(float *d_dst, size_t dst_pitch,
    float *d_src, size_t src_pitch,
    unsigned int width, unsigned int height);

i.e., a single declaration without any includes. The definition of that function is found in transposeGPU.cu, which can be summarized as follows:

#include <stdio.h>
#include "../transposeGPU.h"
#include "../helper_funcs.h"

#include "transposeCUDA.cu"

void
transposeGPU(float *d_dst, size_t dst_pitch,
    float *d_src, size_t src_pitch,
    unsigned int width, unsigned int height)
{
    // execution configuration parameters
    dim3 threads(16, 16);
    dim3 grid(iDivUp(width, 16), iDivUp(height, 16));
    size_t shared_mem_size =
        (threads.x * threads.y + (threads.y - 1)) * sizeof(float);

    transposeCUDA<<<grid, threads, shared_mem_size>>>(
        d_dst, dst_pitch / sizeof(float),
        d_src, src_pitch / sizeof(float),
        width, height);
}

i.e., tranposeGPU.cu includes its header file and transposeCUDA.cu, besides defining transposeGPU() and calling transposeCUDA(), the latter found in transposeCUDA.cu. Now, transposeCUDA.cu defines the function as expected:

#include "common_kernel.h"

__global__ void
transposeCUDA(
    float *g_dst, size_t s_dst_pitch,
    const float *g_src, size_t s_src_pitch,
    unsigned int img_width, unsigned int img_height)
{
// several lines of code...
}

It all looks in order, but I still get error LNK2005: "void __cdecl __device_stub__Z13transposeCUDAPfjPKfjjj(float *,unsigned int,float const *,unsigned int,unsigned int,unsigned int)" (?__device_stub__Z13transposeCUDAPfjPKfjjj@@YAXPAMIPBMIII@Z) already defined in transposeCUDA.obj in transposeGPU.obj.

That and some twenty other similar linker errors. Why? There's no apparent redefinition occurring. Any help would be greatly appreciated.

score 1 · Accepted Answer · answered Mar 12 '11 at 13:17

1

There is a redefinition occurring if you are compiling both transposeCUDA.cu and transposeGPU.cu, since the definition appears in both translation units. You should not #include transposeCUDA.cu and apply nvcc to that file.

answered Mar 12 '11 at 13:17

William Pursell

204,365
48
270
300

I'm not sure I understand. Should I write a header file for `transposeCUDA.cu` and include that in `transposeGPU.cu`? `transposeGPU.cu` needs at least a declaration of `transposeCUDA()` to work. – Kristian D'Amato Mar 12 '11 at 13:21
Easiest solution is to provide only a declaration in transposeGPU – William Pursell Mar 12 '11 at 13:30

score 0 · Answer 2 · answered Mar 12 '11 at 13:24

To clarify: __device__ functions are inlined (at least at pre-Fermi), but __global__ are not -- after all, you cannot inline GPU code into your CPU executable function. Global functions can have their address taken, the only difference is that the address points into GPU memory (simiarly as normal pointers to data stored on GPU look like just plain pointers).

As William Pursell said, if you compile your global function twice, you get two functions with the same definition, leading to the linker error.

CUDA and Linker errors

2 Answers2