thrust in cuda kernel

Question

I have cuda 8.0 installed on my machine (Linux SL7 ) also, I have downloaded the thrust 1.8.1 and replace the existing thrust library with the new 1.8.1.

As far as I know starting from thrust 1.8 thrust is supported and can be used in the kernels. I quote from their website :

Thrust 1.8.0 introduces support for algorithm invocation from CUDA __device__ code, support for CUDA streams, and algorithm performance improvements. Users may now invoke Thrust algorithms from CUDA __device__ code

however, when I build the application using the Nsight eclipse, it shows me this error:

calling a __host__ function("thrust::sort") from a __global__ function("mykernel") is not allowed.

Please any advise?

here is my code:

#include <iostream>
#include <numeric>
#include <stdlib.h>
#include <stdio.h>
#include <cuda_runtime.h>
#include <cuda.h>
#include <thrust/sort.h>
#include <thrust/execution_policy.h>

__global__ void mykernel(int* a, int* b)
{

thrust::sort(a, a + 10);
}

int main(void)
{
    int a[10] = { 0, 9, 7, 3, 1, 6, 4, 5, 2, 8 };
    int b[10];
    int *d_a, *d_c;

    cudaMalloc((void**)&d_a, 10 * sizeof(int));
    cudaMalloc((void**)&d_c, 10 * sizeof(int));

    std::cout << "A\n";
    for (int i = 0; i < 10; ++i) {
        std::cout << a[i] << "  ";
    }

    cudaMemcpy(d_a, a, 10 * sizeof(int), cudaMemcpyHostToDevice);
    mykernel<<<1, 1> > >(d_a, d_c);
    cudaMemcpy(a, d_c, 10 * sizeof(int), cudaMemcpyDeviceToHost);
    std::cout << "\nA\n";
    for (int i = 0; i < 10; ++i) {
        std::cout << a[i] << "  ";
    }

    cudaFree(d_a);
    cudaFree(d_c);
    return 0;
}

Possible duplicate of [Thrust inside user written kernels](http://stackoverflow.com/questions/5510715/thrust-inside-user-written-kernels) — Soeren, Feb 07 '17 at 14:29

score 8 · Answer 1 · answered Feb 06 '17 at 19:23

You are correct. Thrust 1.8 and newer do support algorithm calls within device code. However, to take advantage of this, you need to use the new execution policies to make the library work correctly in device code.

If you use the version of sort which includes an execution policy like this:

__global__ void mykernel(int* a, int* b)
{
    thrust::sort(thrust::device, a, a + 10);
}

you should find the code compiles correctly.

thrust in cuda kernel

1 Answers1