1

The following complete example causes an "invalid argument" exception due to the sort function if I have the "Generate relocatable device code" flag set to true (using Visual Studio 2013 & Cuda 7.5), but works just fine when it is false.

This is a problem for me in a bigger project with classes where I need this flag to be true.

Am I making a mistake somewhere?

#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include "thrust\sort.h"
#include "thrust\device_ptr.h"

#define cudaCheck(x) { cudaError_t err = x; if (err != cudaSuccess) { printf("Cuda error: %s in %s at %s:%d\n", cudaGetErrorString(err), #x, __FILE__, __LINE__); assert(0); } }

int main() {
    const int N = 6;
    int    keys2[N] = { 1, 4, 2, 8, 5, 7 };
    char values[N] = { 'a', 'b', 'c', 'd', 'e', 'f' };

    int* dkeys;
    cudaCheck(cudaMalloc((void**)&dkeys, sizeof(int) * N));
    char* dvalues;
    cudaCheck(cudaMalloc((void**)&dvalues, sizeof(char) * N));

    cudaCheck(cudaMemcpy(dkeys, keys2, sizeof(int) * N, cudaMemcpyHostToDevice));
    cudaCheck(cudaMemcpy(dvalues, values, sizeof(char) * N, cudaMemcpyHostToDevice));

    thrust::device_ptr<int> tkeys(dkeys);
    thrust::device_ptr<char> tvalues(dvalues);

    thrust::sort_by_key(tkeys, tkeys + N, tvalues);

    cudaCheck(cudaDeviceSynchronize());
    return 0;
}
Kinru
  • 389
  • 1
  • 6
  • 22
  • 1
    I don't have any trouble with this code with relocatable device code setting in an x64 release or debug project on VS2013/CUDA 7.5. Are you building a 32 bit project or are you building a 64 bit project? Are you building a debug project or are you building a release project? Have you set the GPU architecture compile properties (arch=compute_xx,code=sm_xx) to match your actual GPU that you are running on? What GPU are you running on ? – Robert Crovella Mar 20 '16 at 15:41
  • It crashes in both release and debug mode on a x64 project. The GPU is a GTX 980 TI and I am building with -gencode=arch=compute_20,code=\"sm_20,compute_20\" – Kinru Mar 20 '16 at 16:54
  • 1
    try building with `-gencode=arch=compute_52,code=\"sm_52,compute_52\"` – Robert Crovella Mar 20 '16 at 16:56
  • It works now, thanks! :) I didn't know that there was a problem with this code on compute 2.0 though. I thought it should still be valid. – Kinru Mar 20 '16 at 17:06
  • Probably it should still be valid. It may be a thrust bug. But thrust is a fairly involved abstraction layer, and so there is definitely [the possibility of gotchas when you compile for an architecture that does not match what you are running on](http://stackoverflow.com/questions/29450778/why-intersection-of-thrust-library-is-returning-unexpected-result/29573827#29573827). My advice would be to report this on the thrust-users google groups mailing list, or else file a [thrust issue](https://github.com/thrust/thrust/issues). – Robert Crovella Mar 20 '16 at 17:18
  • 1
    Anyway the usual advice with thrust is to build a release/x64 project, with code generation for the GPU you will run on. – Robert Crovella Mar 20 '16 at 17:19
  • The underlying root cause here may possibly be [an issue with CUB](https://groups.google.com/forum/#!category-topic/cub-users/bugs-and-issues/z5oMxaYaSEs). – Robert Crovella Mar 22 '16 at 00:52

0 Answers0