I'm considering possibilities to process data on a GPU, that is too big for the GPU memory, and I have a few questions.
If I understand that correctly, with mapped memory the data resides in the main memory and is transferred to the GPU only when accessed, so it shouldn't be a problem to allocate more than fits into the GPU memory.
UVA is similar to the mapped memory, but the data can be stored in both the CPU and the GPU memory. But is it possible for the GPU then to access the main memory (as with mapped memory) while being full with its own data? Can a memory overflow happen in this case? I've read that with mapped memory the data goes directly to the local memory without being transferred to the global one first, and in this case there shouldn't be any overflow. Is that true and, if so, is that also true for UVA?
In CUDA 6.0, UM doesn't allow to oversubscribe the GPU memory (and generally doesn't allow to allocate more memory than the GPU has, even in the main memory), but with CUDA 8.0 it becomes possible (https://devblogs.nvidia.com/parallelforall/beyond-gpu-memory-limits-unified-memory-pascal/). Did I get it right?