I have a host class TestClass
that has as a member a pointer to a class TestTable
which has its data stored in an array of floats on the GPU.
TestClass
calls a kernel that accesses the data inside TestTable
, as well as a method GetValue()
from TestClass
.
After reading a lot and trying out several options on which type specifiers to use for which methods and classes as well as on how (and where) to initialize TestTable
, I have the feeling that all my options eventually boil down to the same memory access error. So probably my understanding of how Cuda/C++ works is not sufficient for implementing it right. How should my code be properly set up?
This is the content of a minimal version of my main.cu
:
#include <iostream>
#include <cuda_runtime.h>
#define CUDA_CHECK cuda_check(__FILE__,__LINE__)
inline void cuda_check(std::string file, int line)
{
cudaError_t e = cudaGetLastError();
if (e != cudaSuccess) {
std::cout << std::endl
<< file << ", line " << line << ": "
<< cudaGetErrorString(e) << " (" << e << ")" << std::endl;
exit(1);
}
}
class TestTable {
float* vector_;
int num_cells_;
public:
void Init() {
num_cells_ = 1e4;
cudaMallocManaged(&vector_, num_cells_*sizeof(float));
CUDA_CHECK;
}
void Free() {
cudaFree(vector_);
}
__device__
bool UpdateValue(int global_index, float val) {
int index = global_index % num_cells_;
vector_[index] = val;
return false;
}
};
class TestClass {
private:
float value_;
TestTable* test_table_;
public:
TestClass() : value_(1.) {
// test_table_ = new TestTable;
cudaMallocManaged(&test_table_, sizeof(TestTable));
test_table_->Init();
CUDA_CHECK;
}
~TestClass() {
test_table_->Free();
cudaFree(test_table_);
CUDA_CHECK;
}
__host__ __device__
float GetValue() {
return value_;
}
__host__
void RunKernel();
};
__global__
void test_kernel(TestClass* test_class, TestTable* test_table) {
int index = threadIdx.x + blockIdx.x * blockDim.x;
int stride = blockDim.x * gridDim.x;
for (int i = index; i < 1e6; i += stride) {
const float val = test_class->GetValue();
test_table->UpdateValue(i, val);
}
}
__host__
void TestClass::RunKernel() {
test_kernel<<<1,1>>>(this, test_table_);
cudaDeviceSynchronize(); CUDA_CHECK;
}
int main(int argc, char *argv[]) {
TestClass* test_class = new TestClass();
std::cout << "TestClass successfully constructed" << std::endl;
test_class->RunKernel();
std::cout << "Kernel successfully run" << std::endl;
delete test_class;
std::cout << "TestClass successfully destroyed" << std::endl;
return 0;
}
The error I get is line 88: an illegal memory access was encountered (700)
.
I assume the error lies in one of these issues:
TestTable
is not created properly usingnew
, which is probably bad. However, uncommentingtest_table_ = new TestTable;
inTestClass()
does not resolve the issue.GetValue()
intest_kernel
does not return a valid float variable. If I replace it by an arbitrary float, e.g.1.f
, the program runs without errors. However, in the real (not minimal) version of my code,GetValue()
does a bunch of computations which happen at different points in the code base, so hard-coding is not an option there.- I never copy
TestClass
to the GPU, but call one of its member functions from the kernel. I see that this must cause trouble, but I do not find it intuitive to know where and how to copy it. If I only callGetValue()
in the kernel without reusing its result, there is no error, so it seems that my program can callGetValue()
without the class being copied to the GPU.
Possibly related questions that I was not able to apply to my specific problem:
- Accessing class data members from within cuda kernel - how to design proper host/device interaction? - this one looks very similar, but somehow I do not manage to translate it to my use case.
- Accessing Class Member in different CUDA kernels - here, I am not sure how the fact that I have two classes "interacting" with each other would affect the solution.
- CUDA and Classes - this questions seems much more generic to me.
Any help is highly appreciated!