I am currently working on a CUDA application that will use as much global device memory (VRAM) as is available if the processed data is sufficiently large. What I am allocating is a 3D volume using cudaMalloc3d
, so the memory I use must be contiguous. For this purpose I tried retrieving the amount of free device memory by using the function cudaMemGetInfo
and then allocating as much as is free. However, this does not seem to work. I still get errors when trying to allocate that amount of memory.
Now, my question is whether there is a way to retrieve the maximum amount of device memory that I can allocate contiguously.
One option would be a trial-and-error approach where I iteratively decrease the amount I try to allocate until allocation succeeds. However, I don't like this idea very much.
Background: I have a program that does cone-beam CT reconstruction on the GPU. Those volumes can become quite large so I split them into chunks when necessary. Therefore I have to know how large a chunk can at most be to still fit into global device memory.