I've just started trying to learn CUDA again and came across some code I don't fully understand.
// declare GPU memory pointers
float * d_in;
float * d_out;
// allocate GPU memory
cudaMalloc((void**) &d_in, ARRAY_BYTES);
cudaMalloc((void**) &d_out, ARRAY_BYTES);
When the GPU memory pointers are declared, they allocate memory on the host. The cudaMalloc calls throw away the information that d_in
and d_out
are pointers to floats.
I can't think why cudaMalloc would need to know about where in host memory d_in
& d_out
have originally been stored. It's not even clear why I need to use the host bytes to store whatever host address d_in
& d_out
point to.
So, what is the purpose of the original variable declarations on the host?
======================================================================
I would've thought something like this would make more sense:
// declare GPU memory pointers
cudaFloat * d_in;
cudaFloat * d_out;
// allocate GPU memory
cudaMalloc((void**) &d_in, ARRAY_BYTES);
cudaMalloc((void**) &d_out, ARRAY_BYTES);
This way, everything GPU related takes place on the GPU. If d_in
or d_out
are accidentally used in host code, an error can be thrown at compile time, since those variables wouldn't be defined on the host.
I guess what I also find confusing is that by storing device memory addresses on the host, it feels like the device isn't in fully in charge of managing its own memory. It feels like there's a risk of host code accidentally overwriting the value of either d_in
or d_out
either through accidentally assigning to them in host code or another more subtle error, which could cause the GPU to lose access to its own memory. Also, it seems strange that the addresses assigned to d_in
& d_out
are chosen by the host, instead of the device. Why should the host know anything about which addresses are/are not available on the device?
What am I failing to understand here?