External call of a class method in a kernel

Question

I have a class FPlan that has a number of methods such as permute and packing.

__host__ __device__ void Perturb_action(FPlan *dfp){
  dfp->perturb();
  dfp->packing();
}

__global__ void Vector_Perturb(FPlan **dfp, int n){

int i=threadIx.x;
if(i<n) Perturb_action(dfp[i]);
}

in main:

FPlan **fp_vec;
fp_vec=(FPlan**)malloc(VEC_SIZE*sizeof(FPlan*));
//initialize the vec
for(int i=0; i<VEC_SIZE;i++)
 fp_vec[i]=&fp;
//fp of type FPlan that is initialized

int v_sz=sizeof(fp_vec);
double test=fp_vec[0]->getCost();
printf("the cost before perturb %f\n"test);
FPlan **value;
cudaMalloc(&value,v_sz);
cudaMemcpy(value,&fp_vec,v_sz,cudaMemcpyHostToDevice);

//call kernel
dim3 threadsPerBlock(VEC_SIZE);
dim3 numBlocks(1);
Vector_Perturb<<<numBlocks,threadsPerBlock>>> (value,VEC_SIZE);
cudaMemcpy(fp_vec,value,v_sz,cudaMemcpyDeviceToHost);
test=fp_vec[0]->getCost();
printf("the cost after perturb %f\n"test);
test=fp_vec[1]->getCost();
printf("the cost after perturb %f\n"test);

I am getting before permute for fp_vec[0] printf the cost 0.8. After permute for fp_vec[0] the value inf and for fp_vec[1] the value 0.8.

The expected output after the permutation should be something like fp_vec[0] = 0.7 and fp_vec[1] = 0.9. I want to apply these permutations to an array of type FPlan.

What am I missing? Is calling an external function supported in CUDA?

SO [expects](http://stackoverflow.com/help/on-topic), for questions like this ("why isn't this code working?"), that you provide a [complete MCVE code](http://stackoverflow.com/help/mcve). — Robert Crovella, Feb 12 '15 at 15:14

score 1 · Answer 1 · edited May 23 '17 at 12:05

This seems to be a common problem these days:

Consider the following code:

#include <stdio.h>
#include <stdlib.h>
int main() {
    int* arr = (int*) malloc(100);
    printf("sizeof(arr) = %i", sizeof(arr));
    return 0;
}

what is the expected ouptut? 100? no its 4 (at least on a 32 bit machine). sizeof() returns the size of the type of a variable not the allocated size of an array.

int v_sz=sizeof(fp_vec);
double test=fp_vec[0]->getCost();
printf("the cost before perturb %f\n"test);
FPlan **value;
cudaMalloc(&value,v_sz);
cudaMemcpy(value,&fp_vec,v_sz,cudaMemcpyHostToDevice);

You are allocating 4 (or 8) bytes on the device and copy 4 (or 8) bytes. The result is undefined (and maybe every time garbage).

Besides that, you shold do proper error checking of your CUDA calls. Have a look: What is the canonical way to check for errors using the CUDA runtime API?

I think there are other problems with the code as well. You cannot copy an array of objects to the device with a single `cudaMemcpy` call using an array of pointers. — Robert Crovella, Feb 12 '15 at 15:12

External call of a class method in a kernel

1 Answers1