5

I am currently working on a CUDA application that will use as much global device memory (VRAM) as is available if the processed data is sufficiently large. What I am allocating is a 3D volume using cudaMalloc3d, so the memory I use must be contiguous. For this purpose I tried retrieving the amount of free device memory by using the function cudaMemGetInfo and then allocating as much as is free. However, this does not seem to work. I still get errors when trying to allocate that amount of memory.

Now, my question is whether there is a way to retrieve the maximum amount of device memory that I can allocate contiguously.

One option would be a trial-and-error approach where I iteratively decrease the amount I try to allocate until allocation succeeds. However, I don't like this idea very much.

Background: I have a program that does cone-beam CT reconstruction on the GPU. Those volumes can become quite large so I split them into chunks when necessary. Therefore I have to know how large a chunk can at most be to still fit into global device memory.

einpoklum
  • 118,144
  • 57
  • 340
  • 684
bweber
  • 3,772
  • 3
  • 32
  • 57

1 Answers1

7

Now, my question is if there is a way to retrieve the maximum amount of device memory that I can allocate contiguously.

There is not.

With a bit of trial and error, you can come up with an estimated maximum, say 80% of the available memory reported by cudaMemGetInfo(), and use that.

The situation with cudaMalloc is generally similar to a host-side allocator, e.g. malloc. If you queried the host operating system for the available memory, then tried to allocate all of it in a single malloc call, it would likely fail.

Robert Crovella
  • 143,785
  • 11
  • 213
  • 257
  • 3
    The iterative approach is really the best way to to this. Take what cudaMemGetInfo as free and step downwards in 1MiB increments until the allocation call succeeds. That is how I always do it – talonmies Mar 31 '16 at 13:43
  • 1
    I looked for a duplicate but had some trouble locating one that would be simple enough to avoid argument. Nevertheless there are many similar questions on the CUDA tag, and the iterative approach is outlined by the answer given by @talomies [here](http://stackoverflow.com/questions/8905949/why-is-cudamalloc-giving-me-an-error-when-i-know-there-is-sufficient-memory-spac/8923966#8923966). – Robert Crovella Mar 31 '16 at 14:00
  • @talonmies: I just tried the iterative approach but somehow even after the malloc succeeded my kernel launch fails. It's like I somehow have to recover from the error. Do you know what I have to do? – bweber Mar 31 '16 at 16:42
  • 2
    @user1488118: That would depend completely on what the error was. I am not going to debug code I have not seen in comments. If you have a repro case, post a new question. – talonmies Mar 31 '16 at 16:43
  • @talonmies Sorry, it was an error in my code. Now it works. But after a failed attempt I will have to call `cudaGetLastError()` otherwise subsequent cuda api calls will fail and will just give the out-of-memory error again. I guess this has got something to do with the "non-sticky" errors. See [this answer](http://stackoverflow.com/questions/31642520/states-of-memory-data-after-cuda-exceptions#answer-31642573) @Robert Crovella: There are similar questions, but they can't offer a good solution and they are very old (2011 or something) so something might have changed in the meantime. – bweber Mar 31 '16 at 17:22