I am just curious about how this is implemented under the hood. I have tested it out by "cudaMallocManaging" a huge buffer and nvidia-smi isn't showing anything (unless the buffer size is under the available GDDR).
First of all I suggest you do proper CUDA error checking on all CUDA API calls. It would seem from your description that you are not.
demand-paging in unified memory (UM) allowing the increase in memory size beyond the GPU physical DRAM memory will only work with:
Pascal (or future) GPUs
CUDA 8 (or future) toolkit
Other than that, your setup should probably work. If it's not working for you with CUDA 8 (not CUDA 8RC) and a Pascal GPU, make sure that you meet the requirements (e.g. OS) for UM and also do proper error checking. Rather than trying to infer what is happening from nvidia-smi
, run an actual test on the allocated memory.
For a more general description of the feature I refer you to this blog article.