0

I have written a class which I was using in a parallelized (multi-[cpu]thread) program.

I am in the process of translating that program to run on CUDA hardware. (GPGPU system.)

I am nearly there with it, but I am encountering the following error:

calling a __host__ function("simulation::simulation") from a __global__ function("cuda_kernel_func") is not allowed

So I know that what I am doing is not allowed, but I do not know why. I would like to know:

  • Firstly why I cannot do what I am trying to do. (See code.)
  • Secondly, what I can do instead.

Perhaps the answer is simply "you cannot use classes and OOP with CUDA"? Or perhaps that is not correct and the answer is something different.

The easiest way to explain what I am trying to do is of course by showing you some code, so here is my CUDA kernel function:

(I've cut a lot of stuff out of this, most of which was just commented out [old] code - so hopefully I haven't screwed anything up.)

... be prepared for what basically just looks like an obscure algorithm.

__global__
void cuda_kernel_func(std::mt19937_64 *gen_p, double *value, double *r1_value, uint64_t NUM_VALUES)
{

    uint64_t index = blockIdx.x * blockDim.x + threadIdx.x;
    if(index < NUM_VALUES)
    {

        const uint64_t NUM_REPEATS = 32;

        double *d_array = (double*)malloc(NUM_REPEATS * sizeof(double));

        for(uint64_t kx = 0; kx < NUM_REPEATS; ++ kx)
        {

            uint64_t NUM_RUNS = 1000;
            int *d_log = (int*)malloc(NUM_RUNS * sizeof(int));

            for(uint64_t run_ix = 0; run_ix < NUM_RUNS; ++ run_ix)
            {
                int NUM_C = 10;
                simulation s(NUM_C, gen_p, r1_value[index]);
                s.run(100 * NUM_C);
                if(s.get() == true)
                {
                    d_log[run_ix] = 1;
                }
                else
                {
                    d_log[run_ix] = 0;
                }
            }

            long long sum = 0;
            for(uint64_t ix = 0; ix < NUM_RUNS; ++ ix)
            {
                sum += d_log[ix];
            }

            double ratio = (double)sum / (double)NUM_RUNS;

            d_array[kx] = ratio;

            free(d_log);

        }

        double mean = 0.0;
        for(uint64_t kx = 0; kx < NUM_REPEATS; ++ kx)
        {
            mean += d_array[kx];
        }
        mean /= (double)NUM_REPEATS;

        value[index] = mean;

        free(d_array);
    }
}

And here is my "simulation" class.

class simulation
{

public:

    simulation(uint64_t nc, std::mt19937_64 *mt19937_64_pointer, double r1_input)
        : dis_p{new std::uniform_real_distribution<double>(0.0, 1.0)}
    {
        gen_p = mt19937_64_pointer;

        m_r1 = r1_input;

        cl default;

        v.reserve(nc);
        for(uint64_t ix = 0; ix < nc; ++ ix)
            v.push_back(default);
    }

    ~simulation()
    {
        delete dis_p;
    }

    void run(uint64_t num_steps)
    {
        // Algorithm code (Omitted)
    }

    bool get()
    {
        // Algorithm code (Omitted)
    }

private:
    std::vector<cell> v;

    std::mt19937_64 *gen_p;
    std::uniform_real_distribution<double> *dis_p;

    double m_r1;

};

If you spot an inconsistency add a comment and I'll fix it.

This is quite a general question - I'm sure this error message is frequently encountered. I've googled it of course, but didn't find much helpful info.

FreelanceConsultant
  • 13,167
  • 27
  • 115
  • 225
  • 2
    you need to mark the methods with `__device__` (see the linked duplicate); in your case you need to mark all methods of class `simulation` which includes the constructor – m.s. Oct 08 '15 at 13:40
  • 1
    aside from that, some more things: (1) I would keep dynamic memory allocation to a minimum (no need to dynamically allocate `gen_p` and `dis_p`); (2) use `thrust::random` instead of `std::...` (see [documentation](http://thrust.github.io/doc/group__random.html)) – m.s. Oct 08 '15 at 13:43

0 Answers0