0

I have the following trivial thrust::gather program (taken directly from the thrust::gather documentation)

#include <thrust/gather.h>
#include <thrust/device_vector.h>
int main(void)
{

    // mark even indices with a 1; odd indices with a 0
    int values[10] = {1, 0, 1, 0, 1, 0, 1, 0, 1, 0};
    thrust::device_vector<int> d_values(values, values + 10);
    // gather all even indices into the first half of the range
    // and odd indices to the last half of the range
    int map[10] = {0, 2, 4, 6, 8, 1, 3, 5, 7, 9};
    thrust::device_vector<int> d_map(map, map + 10);
    thrust::device_vector<int> d_output(10);
    thrust::gather(d_map.begin(), d_map.end(),
            d_values.begin(),
            d_output.begin());
    // d_output is now {1, 1, 1, 1, 1, 0, 0, 0, 0, 0}
    return 0;
}

I compile this with

/usr/local/cuda/bin/nvcc -ccbin g++ -I../../common/inc  -m64 -g -G -gencode arch=compute_30,code=sm_30 -o thrustGather.o -c thrustGather.cu

/usr/local/cuda/bin/nvcc -ccbin g++ -m64 -g -G -o thrustGather thrustGather.o

Next I try running this simple program after first attaching it to cuda-gdb:

>cuda-gdb ./thrustGather
NVIDIA (R) CUDA Debugger
5.5 release
Portions Copyright (C) 2007-2013 NVIDIA Corporation
GNU gdb (GDB) 7.2
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/local/cuda-5.5/samples/0_Simple/thrustGatherRjm/thrustGather...done.
(cuda-gdb) run
Starting program: /usr/local/cuda-5.5/samples/0_Simple/thrustGatherRjm/thrustGather 
[Thread debugging using libthread_db enabled]
[New Thread 0x7ffff7272700 (LWP 50318)]
[Context Create of context 0x78d790 on Device 0]
[Launch of CUDA Kernel 0 (memset32_aligned1D<<<(1,1,1),(128,1,1)>>>) on Device 0]
[Launch of CUDA Kernel 1 (memset32_aligned1D<<<(1,1,1),(128,1,1)>>>) on Device 0]
[Launch of CUDA Kernel 2 (memset32_aligned1D<<<(1,1,1),(128,1,1)>>>) on Device 0]
[Launch of CUDA Kernel 3 (memset32_aligned1D<<<(1,1,1),(128,1,1)>>>) on Device 0]
[Launch of CUDA Kernel 4 (memset32_aligned1D<<<(1,1,1),(128,1,1)>>>) on Device 0]
[Launch of CUDA Kernel 5 (memset32_aligned1D<<<(1,1,1),(128,1,1)>>>) on Device 0]
[Launch of CUDA Kernel 6 (memset32_aligned1D<<<(1,1,1),(128,1,1)>>>) on Device 0]
[Launch of CUDA Kernel 7 (memset32_aligned1D<<<(1,1,1),(128,1,1)>>>) on Device 0]
Error: received unexpected signal: Segmentation fault
BACKTRACE (41 frames):
cuda-gdb[0x4394e1]
/lib64/libc.so.6[0x3d96635a90]
cuda-gdb[0x5b038b]
cuda-gdb[0x55aae8]
cuda-gdb[0x55ed65]
cuda-gdb[0x55fc51]
cuda-gdb[0x55ec22]
cuda-gdb[0x5609fe]
cuda-gdb[0x5607bd]
cuda-gdb[0x560c36]
cuda-gdb[0x4f7e44]
cuda-gdb[0x4f8038]
cuda-gdb[0x4fde3c]
cuda-gdb[0x5c9f66]
cuda-gdb[0x429c3c]
cuda-gdb[0x5ca4e5]
cuda-gdb[0x5cab5e]
cuda-gdb[0x4296e6]
cuda-gdb[0x479366]
cuda-gdb[0x53addd]
cuda-gdb[0x5129c0]
cuda-gdb[0x5134fd]
cuda-gdb[0x51369d]
cuda-gdb[0x5091e7]
cuda-gdb[0x40f65d]
cuda-gdb[0x522f54]
cuda-gdb[0x523a20]
cuda-gdb[0x5ff9aa]
cuda-gdb[0x522fb9]
cuda-gdb[0x521b81]
cuda-gdb[0x522b1e]
cuda-gdb[0x51d0cb]
cuda-gdb[0x4ae816]
cuda-gdb[0x406429]
cuda-gdb[0x51d0cb]
cuda-gdb[0x406b76]
cuda-gdb[0x51d0cb]
cuda-gdb[0x406204]
cuda-gdb[0x4061d6]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x3d96621b75]
cuda-gdb[0x4060e9]
[Termination of CUDA Kernel 7 (memset32_aligned1D<<<(1,1,1),(128,1,1)>>>) on Device 0]
[Termination of CUDA Kernel 6 (memset32_aligned1D<<<(1,1,1),(128,1,1)>>>) on Device 0]
[Termination of CUDA Kernel 5 (memset32_aligned1D<<<(1,1,1),(128,1,1)>>>) on Device 0]
[Termination of CUDA Kernel 4 (memset32_aligned1D<<<(1,1,1),(128,1,1)>>>) on Device 0]
[Termination of CUDA Kernel 3 (memset32_aligned1D<<<(1,1,1),(128,1,1)>>>) on Device 0]
[Termination of CUDA Kernel 2 (memset32_aligned1D<<<(1,1,1),(128,1,1)>>>) on Device 0]
[Termination of CUDA Kernel 1 (memset32_aligned1D<<<(1,1,1),(128,1,1)>>>) on Device 0]
[Termination of CUDA Kernel 0 (memset32_aligned1D<<<(1,1,1),(128,1,1)>>>) on Device 0]

Note that cuda-gdb, itself, segfaults. I also ran the correpsonding expamples for

  • thrust::gather_if
  • thrust::scatter
  • thrust::count
  • thrust::inclusive_scan
  • thrust::sort_by_key
  • thrust::reduce_by_key

And of these, only the last three (inclusive_scan, sort_by_key, reduce_by_key), work (i.e., do not crash cuda-gdb).

This must be an issue with the latest release (5.5) of thrust and/or cuda-gdb, because I have run the same tests with release 5.0 with no issues whatsoever.

Here is some info about my setup:

> cat /proc/driver/nvidia/version 
NVRM version: NVIDIA UNIX x86_64 Kernel Module  319.21  Sat May 11 23:51:00 PDT 2013
GCC version:  gcc version 4.8.1 20130603 (Red Hat 4.8.1-1) (GCC) 

> cat /proc/version 
Linux version 3.9.9-302.fc19.x86_64 (mockbuild@bkernel01.phx2.fedoraproject.org) (gcc version 4.8.1 20130603 (Red Hat 4.8.1-1) (GCC) ) #1 SMP Sat Jul 6 13:41:07 UTC 2013

> gcc --version
gcc (GCC) 4.8.1 20130603 (Red Hat 4.8.1-1)
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

> lspci | grep NVIDIA
05:00.0 3D controller: NVIDIA Corporation GK104 [GeForce GTX 690] (rev a1)
05:00.1 Audio device: NVIDIA Corporation GK104 HDMI Audio Controller (rev a1)
06:00.0 VGA compatible controller: NVIDIA Corporation GK104 [GeForce GTX 690] (rev a1)
06:00.1 Audio device: NVIDIA Corporation GK104 HDMI Audio Controller (rev a1)
rmccabe3701
  • 1,418
  • 13
  • 31
  • Thrust is known to not compile and run correctly when built for debugging. – talonmies Jul 17 '13 at 15:45
  • Didn't know that, thanks. How do you suggest that I go about debugging a much larger CUDA program with several thrust calls embedded in it (I don't care to debug into the thrust kernels themselves, just my own kernels, problem is having thrust calls in the program causes cuda-gdb to crash)? – rmccabe3701 Jul 17 '13 at 15:53
  • I suppose I could put all my thrust calls in its own .cu file and compile that without the -G flag. The rest of my code i can compile with debugging – rmccabe3701 Jul 17 '13 at 16:00
  • Given you are running a kepler GPU, you could try splitting out the Thrust code into its own source file and using separate complation mode. Bult the thrust with optimisation and your own code with debugging symbols and link them together. I have no idea if that would work, but it might be worth exploring – talonmies Jul 17 '13 at 16:01
  • Sweet, breaking the thrust calls out into a different file works great! It was kinda a pain to do because I needed to explicitly instantiate each thrust template i was using (otherwise I got link errors). – rmccabe3701 Jul 17 '13 at 18:18
  • 1
    Could you add that as a short answer? That way you can leave the solution behind for the next person with the same problem. Youncan accept your oen answer too, which will take it off the unanswered question list and make it easier to find by search. – talonmies Jul 17 '13 at 20:34
  • The fix for this crash should go into the final release of CUDA 5.5. – Jared Hoberock Jul 18 '13 at 22:44

1 Answers1

1

As talonmies indicated, the problem is the Thrust libraries do not run correctly when built with debugging. In my application I have a rather complicated .cu file containing several of my own CUDA kernels, along with multiple Thrust calls. If I were to compile this file with the -g -G debug flags and run inside cuda-gdb, it would crash -- making it impossible for me to debug my kernels.

Since I do not care about debugging the Thrust calls themselves (only my kernels), my solution involved putting all my thrust calls in another file thrustWrappers.cu and compiling this file without debugging. Then in my main .cu file I would replace the calls to thrust with associated wrapper functions (defined in thrustWrappers). For example,

thrust::reduce(...)

became

thrust::reduce_wrapper(...)

Then I would link the two resulting object files together.

rmccabe3701
  • 1,418
  • 13
  • 31