GPU memory oversubscription with mapped memory, Unified Virtual Addressing and Unified Memory

Question

I'm considering possibilities to process data on a GPU, that is too big for the GPU memory, and I have a few questions.

If I understand that correctly, with mapped memory the data resides in the main memory and is transferred to the GPU only when accessed, so it shouldn't be a problem to allocate more than fits into the GPU memory.

UVA is similar to the mapped memory, but the data can be stored in both the CPU and the GPU memory. But is it possible for the GPU then to access the main memory (as with mapped memory) while being full with its own data? Can a memory overflow happen in this case? I've read that with mapped memory the data goes directly to the local memory without being transferred to the global one first, and in this case there shouldn't be any overflow. Is that true and, if so, is that also true for UVA?

In CUDA 6.0, UM doesn't allow to oversubscribe the GPU memory (and generally doesn't allow to allocate more memory than the GPU has, even in the main memory), but with CUDA 8.0 it becomes possible (https://devblogs.nvidia.com/parallelforall/beyond-gpu-memory-limits-unified-memory-pascal/). Did I get it right?

score 7 · Accepted Answer · answered Sep 21 '17 at 01:17

7

Yes, with mapped (i.e. pinned, "zero-copy") method, the data stays in host memory and is transferred to the GPU on-demand, but never becomes resident in GPU memory (unless GPU code stores it there). If you access it multiple times, you may need to transfer it multiple times from the host.

UVA (Unified Virtual Addressing see here) is not the same thing as UM (Unified Memory, see here) or managed memory (== UM), so I shall refer to this case as UM, not UVA.

With UM on a pre-pascal device, UM "managed" allocations will be moved automatically between CPU and GPU subject to some restrictions, but you cannot oversubscribe GPU memory. The maximum amount of all ordinary GPU allocations plus UM allocations cannot exceed GPU physical memory.

With UM plus CUDA 8.0 or later plus a Pascal or newer GPU, you can oversubscribe GPU memory with UM ("managed") allocations. These allocations are then nominally limited to the size of your system memory (minus whatever other demands there are on system memory). In this case, data is moved back and forth automatically between host and device memory, by the CUDA runtime, using a demand-paging method.

UVA is not an actual data management technique in CUDA. It is an underlying technology that enables some features, like certain aspects of mapped memory and generally enables UM features.

answered Sep 21 '17 at 01:17

Robert Crovella

143,785
11
213
257

If I get it right, UVA is the same as mapped memory, but allows to use single pointer to the data on both CPU and GPU. However, it only allows the GPU to access the main memory, but still doesn't allow the CPU to access the GPU memory (this is possible only with UM, I think). Is it correct? – lawful_neutral Sep 21 '17 at 02:22
2

I personally think UVA is a separate concept from mapped memory. But if you want to say UVA is mapped memory where you only have a single pointer to the data, I'm not going to argue that. UVA is an underlying technology which allows mapped or zero-copy memory to be set up with a single pointer to the data. There aren't any circumstances under which the CPU can directly access GPU memory. When the GPU accesses UM allocations, it happens by copying the data from GPU memory to CPU memory first. Then the CPU access to memory can proceed. – Robert Crovella Sep 21 '17 at 06:28
Thank you for the answer, it got more clear now! Could you please answer one more question: when mapped memory is used, does the accessed data go directly to the local memory of the thread that works with it, without being transferred to the global memory first? – lawful_neutral Sep 21 '17 at 12:16
1

When mapped memory is being used, the accessed data goes directly into the GPU register file. It does not get transferred to global memory first. I already indicated this in my answer when I said "but never becomes resident in GPU memory" – Robert Crovella Sep 21 '17 at 14:18

GPU memory oversubscription with mapped memory, Unified Virtual Addressing and Unified Memory

1 Answers1