Question 1)
When I call CUDA driver API, usually I need first push the context (which represents a GPU runtime) to current thread. For normal cuMalloc
, the memory will be allocated on that GPU specified by the context. But if I try to call cuMallocManaged
to create unified memory, do I still need to push a GPU context?
Question 2)
Say I have 2 GPUs, each has 1 GB DRAM. So can I allocate unified memory of 2 GB? with each GPU holds half of it?