Run class constructor/method on device. How?

Question

How do I run the constructor on device side? When I use SolidSphere sphere<<<1, 1>>> (1, 12, 24), compiler gives me error:

error : no default constructor exists for class "SolidSphere"

class SolidSphere
{
.
.
.
public:
    __device__ __host__ SolidSphere(float radius, unsigned int rings, unsigned int sectors)
.
.
.
};

SolidSphere sphere<<<1, 1>>> (1, 12, 24);

why would you do this: `SolidSphere sphere<<<1, 1>>> (1, 12, 24);` ? That's not how you launch a kernel, if that is your intent. You launch a kernel like this: `sphere<<<1, 1>>> (1, 12, 24);` If you just want to instantiate an object from device code, it would be `SolidSphere my_sphere(1,12,24);` It might help if you provide a more complete example of what you are trying to do. — Robert Crovella, Apr 18 '14 at 16:39
Here is my complete code: http://stackoverflow.com/questions/23045852/opengl-c-3d-sphere. Only difference, I put `__device__ __host__` before methods. — , Apr 18 '14 at 16:42
No, that's not the only difference. In this question, you are trying to kernel-configure an object instantiation. Don't do that. Get rid of the triple-chevrons: `<<<1,1>>>`. And rather than link to another question that has no CUDA code in it at all, please provide the *actual* code *in this question*. — Robert Crovella, Apr 18 '14 at 16:45
[This code](http://pastebin.com/MgEHcPV3) compiles just fine. — Robert Crovella, Apr 18 '14 at 16:53
That's the problem -- I'm trying to adjust the C code from the link, so that it runs on device, instead of host. — , Apr 18 '14 at 16:53
@Robert Crovella Dude, followed your advice, but now I get tens of warnings like: `CUDACOMPILE : warning : calling a __host__ function("std::_Iterator_base12::~_Iterator_base12") from a __host__ __device__ function("std::_Iterator_base12::~_Iterator_base12 [subobject]") is not allowed` — , Apr 19 '14 at 10:54
Well, I asked you to post your code but you haven't. I guess you think I'm psychic. But, I'm almost psychic. Judging from your post on [devtalk](https://devtalk.nvidia.com/default/topic/732405/cuda-programming-and-performance/-quot-calling-a-__host__-function-from-a-__host__-__device__-function-is-not-allowed-quot-/) you can't use `std::vector` (or most things from the STL) in device code. You'll note that the compilable example I gave doesn't have any STL stuff in it. — Robert Crovella, Apr 19 '14 at 13:40
Unfortunately, can't just upload the code, because stackoverflow won't let me to without inserting bigger description. Well, at least now I know what was wrong with it. Big thank. — , Apr 19 '14 at 15:56

score 2 · Answer 1 · answered Apr 18 '14 at 17:02

Your biggest problem is understanding the difference between device code and a kernel function.

Device code can be instantiated within a kernel function. A kernel function is the entry point to a CUDA device.

This is what you have:

class SolidSphere
{
public:
    __device__ __host__ SolidSphere(float radius, 
          unsigned int rings, unsigned int sectors);
};

This is what you need:

__global__ void KernelSolidSphere(/** inputs and outputs */) {
     // notice this is how you use __device__ compiled code
     SolidSphere sphere(10.32, 3, 5);
     // use the sphere here
     return;
}

And this is how you call it from the host side:

KernelSolidSphere<<<1, 1>>>(/** inputs and outputs */);

When I first started learning I used this resource abundantly. This should provide everything you need.

IdeaHat · Answer 2 · 2014-04-18T17:45:29.167

I'm going to take a stab at this, with the clarification that I've never done this and this is just my understanding.

__device__ calls can made from a kernel (__global__). You can not have __global__ member functions of a class.

What you can have is a __global__ init call, but it cannot allocate new memory.

IF you want to initialize the a memory block with a constructor, the best thing to use would be a placement new:

class Point
{
public:
  __host__ __device__ Point() {}

  __host__ __device__ Point(int a,int b) : x(a), y(b)
  {
  }
  int x,y;
private:

};

__global__ void init_point(void* buffer,int a, int b)
{
  new(buffer) Point(a,b);
}
#include <iostream>

int main()
{
  int count = 0;
    int i = 0;

  cudaGetDeviceCount(&count);
    if(count == 0) {
        fprintf(stderr, "There is no device.\n");
        return false;
    }
  int cuda_count = 0;
    for(i = 0; i < count; i++) {
        cudaDeviceProp prop;
        if(cudaGetDeviceProperties(&prop, i) == cudaSuccess) 
    {
      if (prop.major >= 1) { cuda_count++;}
      std::cout << "[" << i << "] --" << prop.name << std::endl;
        }
    }

    if(cuda_count == 0) {
        fprintf(stderr, "There is no device supporting CUDA.\n");
        return -1;
    }

  std::cout << std::endl << "Select device" << std::endl;

  std::cin >> i;

    cudaSetDevice(i);

  printf("CUDA initialized.\n");

  void* buff;
  cudaMalloc(&buff,sizeof(Point));
  init_point<<<1,1>>>(buff,10,20);
    cudaThreadSynchronize();
  Point cpu_point;
  cudaMemcpy(&cpu_point,buff,sizeof(Point),cudaMemcpyDeviceToHost);
  std::cout << cpu_point.x << std::endl;
  std::cout << cpu_point.y << std::endl;
  getchar();
  getchar();
  return 0;
}

Obviously, this could be expanded to init_point can initialize points in a multithreaded fashion.

Be warned that array-of-structures is typically much slower than structure-of-array design on Cuda architectures.

Run class constructor/method on device. How?

2 Answers2