3

Question 1)

When I call CUDA driver API, usually I need first push the context (which represents a GPU runtime) to current thread. For normal cuMalloc, the memory will be allocated on that GPU specified by the context. But if I try to call cuMallocManaged to create unified memory, do I still need to push a GPU context?

Question 2)

Say I have 2 GPUs, each has 1 GB DRAM. So can I allocate unified memory of 2 GB? with each GPU holds half of it?

einpoklum
  • 118,144
  • 57
  • 340
  • 684
Xiang Zhang
  • 2,831
  • 20
  • 40

1 Answers1

3
  1. Follow established driver API programming methods. Explicitly establish a CUDA context.

  2. No, this is not how managed memory works. A managed allocation is visible, in its entirety, to all GPUs in the system. This is true whether we are talking about a pre-pascal UM regime or a pure-pascal UM regime, although the specific method of visibility varies. Refer to the programming guide sections on UM with multi-GPU.

Robert Crovella
  • 143,785
  • 11
  • 213
  • 257
  • In the url you pointed out, it says: for pre-sm60 device, the current active device is the home for the physical allocation, is that also true for sm >= 6.0? Which means, one allocation can only be done in one GPU, but can be accessed from other GPUs. Then if I have 2 GPUs, each has 1 GB, the max size of unified memory I can allocate is 1 GB, not 2GB. – Xiang Zhang May 25 '17 at 15:40
  • 1
    The link says that for cc 6.0 and higher, the allocation will be migrated to the GPU in question, as needed. The UM allocation is *visible to* (i.e. *mapped to*) each GPU in the system, and it will be migrated to any GPU on-demand, i.e. when that GPU generates a page fault against that item, or when that item is explicitly moved e.g. via a `cudaMemPrefetchAsync` command. – Robert Crovella May 25 '17 at 16:02
  • Thanks, I read the document, so for sm60 and above, the system handles where the physical memory located. But then, if I call `cudaMallocManaged`, why should I still push a context first? Or I can use any context? I'm asking because context is created on one GPU, for two gpus, you will have 2 contexts, then which one should I use for unified memory allocation? – Xiang Zhang May 26 '17 at 13:29