I've been trying to get some of my other code to run, and I've run into a bit of an issue with dynamic shared memory. According to documentation (https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#shared), I'm supposed to allocate one array of memory, and then typecast pointers to specific locations in that array like this:
extern __shared__ float array[];
short* array0 = (short*)array;
float* array1 = (float*)&array0[128];
int* array2 = (int*)&array1[64];
However, in my own code, this doesn't necessarily always work, and I can't quite figure out why.
My basic structure has 2 classes A
and B
as well as an error checking macro
#define cudaCheckError() { \
cudaError_t err = cudaGetLastError(); \
if(err != cudaSuccess) { \
printf("Cuda error: %s:%d: Error code %d, %s\n", __FILE__, __LINE__, err,cudaGetErrorString(err)); \
exit(1); \
} \
}
class A {
public:
__device__ virtual int foo() const = 0;
};
class B : public A {
public:
__device__ B() {}
__device__ virtual int foo() const override {
return 1;
}
};
and my kernel
__global__
void kernel() {
int idx = threadIdx.x + blockIdx.x * blockDim.x;
extern __shared__ int shared[];
B* b_array = (B *) &shared[0];
if (idx == 0) {
b_array[0] = B();
printf("%i", b_array[0].foo());
}
__syncthreads();
return;
}
Invoking that kernel with sufficient shared memory specified kernel<<<1, 1, 1000>>>
and checking the error code yields an error Error code 700, an illegal memory access was encountered
. Running cuda-memcheck on this also gives an error code, although a different one: Error code 719, unspecified launch failure
Changing the kernel to:
__global__
void kernel() {
int idx = threadIdx.x + blockIdx.x * blockDim.x;
extern __shared__ B shared[];
if (idx == 0) {
shared[0] = B();
printf("%i", shared[0].foo());
}
__syncthreads();
return;
}
And rerunning gives the expected output without an error.
Is this some sort of issue with derived classes and typecasting in CUDA? I'm not copying objects between host and device, so that shouldn't be an issue. Is it just not possible to cast to an array of objects like I want to do?