After some dialog in the comments, it seems that OP's posted code has no issues. I was able to successfully compile and run this test case built around that code, and so was OP:
$ cat t1005.cu
#include <iostream>
struct Thing {
bool is_solid;
double matrix[9];
};
int main(){
Thing *things;
int num_of_things = 100;
cudaError_t ret = cudaMallocManaged((void **)&things, num_of_things * sizeof(Thing));
if (ret != cudaSuccess) {
std::cout << cudaGetErrorString(ret) << std::endl;
return 1;}
else {
things[10].is_solid = true;
std::cout << "Success!" << std::endl;
return 0;}
}
$ nvcc -arch=sm_30 -o t1005 t1005.cu
$ ./t1005
Success!
$
Regarding this question:
Is it even best practice to do it this way rather than pass a single struct with arrays that are num_of_things large?
Yes, this is a sensible practice and is usable whether managed memory is being used or not. An array of more or less any structure that does not contain embedded pointers to dynamically allocated data elsewhere can be transferred to the GPU in a simple fashion using a single cudaMemcpy
call (for example, if managed memory were not being used.)
To address the question about the 3rd (flags
) parameter to cudaMallocManaged
:
- If it is specified, it is not correct to pass zero (although OP's posted code gives no evidence of that.) You should use one of the documented choices.
- If it is not specified, this is still valid, and a default argument of
cudaMemAttachGlobal
is provided. This can be confirmed by reviewing the cuda_runtime.h
file or else simply compiling/running the test code above. This particular point appears to be an oversight in the documentation, and I've filed an internal issue at NVIDIA to take a look at that. So it's possible the documentation may change in the future with respect to this.
Finally, proper cuda error checking is always in order any time you are having trouble with a CUDA code, and the use of such may shed some light on any errors that are made. The seg fault that the OP reported in code comments was almost certainly due to the cudaMallocManaged
call failing (perhaps because a zero parameter was supplied incorrectly) and as a result the pointer in question (things
) had no actual allocation. Subsequent usage of that pointer would lead to a seg fault. My test code demonstrates how to avoid that seg fault, even if the cudaMallocManaged
call fails for some reason, and the key is proper error checking.