I've reduced my project down to the just the relevant code. The part of this that is really bothering me is that this does not produce any errors. Anyways, I have a struct GpuData
struct GpuData { float x, y, z; };
My goal is to launch a kernel against this struct that takes a function and will apply the function to the struct. So lets look at an example kernel:
__global__ void StructFunctor(GpuData* in_dat, nvstd::function<float(void)> func) {
in_dat->x = func();
in_dat->y += T{1};
};
In this case, the kernel is reduced to something very simple. It will set the x value to the result of the function. It will then add 1 to the y value.
So lets try it. A complete source file (cuda_demo.cu
):
#include <iostream>
#include <nvfunctional>
struct GpuData { float x, y, z; };
__global__ void StructFunctor(GpuData* in_dat, nvstd::function<float(void)> func) {
in_dat->x = func();
in_dat->y += float{1};
};
int main(int argc, char** argv) {
GpuData c_dat {2, 3, 5};
std::cout << "Input x: " << c_dat.x << " y: " << c_dat.y << " z: " << c_dat.z << std::endl;
GpuData* g_dat;
cudaMalloc(&g_dat, sizeof(GpuData));
cudaMemcpy(g_dat, &c_dat, sizeof(GpuData), cudaMemcpyHostToDevice);
StructFunctor<<<1, 1>>>(g_dat, []()->float{return 1.0f;});
cudaMemcpy(&c_dat, g_dat, sizeof(GpuData), cudaMemcpyDeviceToHost);
std::cout << "Output x: " << c_dat.x << " y: " << c_dat.y << " z: " << c_dat.z << std::endl;
return 0;
}
Well if we are actually going to try it, we will need the Cmake files. I've tacked those on at the end.
On my machine it compiles and runs without errors. Here is my output:
./CudaDemo
Input x: 2 y: 3 z: 5
Output x: 2 y: 3 z: 5
They variable has not been modified at all! But if I go back and comment out in_dat-> = func();
then I get this output:
./CudaDemo
Input x: 2 y: 3 z: 5
Output x: 2 y: 4 z: 5
Now the y value has been modified! Thats a good start, but why is it that when I try and use the function the gpu memory becomes immutable? I presume this is some sort of error, but it compiles and runs without warnings or errors.
Now as promised, the cmake files to run this.
cmake_minimum_required(VERSION 3.8)
project(Temp LANGUAGES CXX CUDA)
set(CMAKE_CUDA_STANDARD 14)
add_executable(CudaDemo cuda_demo.cu)
set_property(TARGET CudaDemo PROPERTY CUDA_SEPARABLE_COMPILATION ON)