0

I encountered a strange error when trying to construct a thrust::device_vector<unsigned char> using thrust::device_vector<unsigned char> data(10). The error was "parallel_for failed: invalid device function".

Here are my minimum code to re-produce this error.

main.cpp

#ifdef UNIT_TEST

#define CATCH_CONFIG_MAIN
#include "catch.hpp"

#endif      // UNIT_TEST

myheader.h

#ifndef MYHEADER_H_
#define MYHEADER_H_

#include <string>
#include <vector>
#include <thrust/device_vector.h>

namespace AAA {
namespace BBB{

using byte_t = unsigned char;

using ByteValues = std::vector<byte_t>;
using DByteValues = thrust::device_vector<byte_t>;

#define NaB byte_t(-1)        // Not-a-Byte

}           // namespace BBB
}           // namespace AAA

#endif      // MYHEADER_H_

mytest.cu

#include "catch.hpp"
#include "myheader.h"

namespace AAA {
namespace BBB {

TEST_CASE("Test thrust::device_vector", "[thrust::device_vector]") {
    SECTION("constructor should work") {
        REQUIRE_NOTHROW( DByteValues(10) );
    }
}

}
}

build commands

g++ -DUNIT_TEST -std=c++14 -g3 -O0 -Wall -fmessage-length=0 -pthread -I/usr/local/cuda/include -Iinclude -Iunit-test -I/usr/local/include -c -o .obj/debug/./main.o main.cpp
nvcc -DUNIT_TEST -std=c++14 -m64 -arch=compute_30 -code=sm_30 -dc -expt-extended-lambda -g -G -Xcompiler -Wall,-fmessage-length=0,-pthread -I/usr/local/cuda/include -Iinclude -Iunit-test -I/usr/local/include -c -o .obj/debug/unit-test/mytest.o unit-test/mytest.cu
nvcc -Xlinker -s -L/usr/local/lib -L/usr/local/cuda/lib64 -lcudart -o .bin/debug/MyTest .obj/debug/./main.o .obj/debug/unit-test/mytest.o

system information

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

$ g++ --version
g++ (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ nvidia-smi
Mon Jan 21 22:34:40 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.79       Driver Version: 410.79       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 745     Off  | 00000000:01:00.0  On |                  N/A |
| 20%   41C    P8    N/A /  N/A |    158MiB /  4040MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1399      G   /usr/lib/xorg/Xorg                            59MiB |
|    0      1556      G   /usr/bin/sddm-greeter                         95MiB |
+-----------------------------------------------------------------------------+

After building the test project I ran it and received the following error.

$ .bin/debug/MyTest 

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
MyTest is a Catch v2.5.0 host application.
Run with -? for options

-------------------------------------------------------------------------------
Test thrust::device_vector
  constructor should work
-------------------------------------------------------------------------------
unit-test/mytest.cu:9
...............................................................................

unit-test/mytest.cu:10: FAILED:
  REQUIRE_NOTHROW( DByteValues(10) )
due to unexpected exception with message:
  parallel_for failed: invalid device function

===============================================================================
test cases: 1 | 1 failed
assertions: 1 | 1 failed

Please help. Thank you!

Robert Crovella
  • 143,785
  • 11
  • 213
  • 257
yhf8377
  • 260
  • 1
  • 2
  • 10
  • Your GPU appears to be a maxwell architecture GPU (compute capability 5.x). Run `deviceQuery` to confirm this, then follow the instructions in the linked duplicate to modify your compile architecture switches (`-arch=compute_30 -code=sm_30`) – Robert Crovella Jan 22 '19 at 03:22
  • Thank you! I used '-gencode arch=compute_30,code=sm_30 -gencode arch=compute_50,code=sm_50' to solve the issue. – yhf8377 Jan 22 '19 at 11:23

0 Answers0