3

As I know, we can allocate a Pinned memory area within kernel memory. (From KGPU)

Then, allocate linux kernel data in Pinned memory and transfer to GPU.

But problem is that linux kernel data should be arranged as array.

Today, a case that is a tree.

I have tried pass it from Pinned memory to GPU.

But when a node access next node, memory access error occured.

I was wondering is Unified Memory can be allocated as Pinned memory area in kernel memory?

So tree can be builded in Unified Memory area and used by GPU without other runtime API like cudaMallocMaganed.

Or is that Unified memory must only use cudaMallocMaganed?

weichen
  • 31
  • 2

1 Answers1

1

But when a node access next node, memory access error occurred.

This just means you have a bug in your code.

Or is that Unified memory must only use cudaMallocManaged?

Currently, the only way to access the features of Unified Memory is to use a managed allocator. For dynamic allocations, that is cudaMallocManaged(). For static allocations, it is via the __managed__ keyword.

The programming guide has additional information.

In response to the comments below, here is a trivial worked example of creating a singly-linked list using pinned memory, and traversing that list in device code:

$ cat t1115.cu
#include <stdio.h>
#define NUM_ELE 5

struct ListElem{

   int id;
   bool last;
   ListElem *next;
};

__global__ void test_kernel(ListElem *list){

  int count = 0;
  while (!(list->last)){
    printf("List element %d has id %d\n", count++, list->id);
    list = list->next;}
  printf("List element %d is the last item in the list\n", count);
}

int main(){
  ListElem *h_list, *my_list;
  cudaHostAlloc(&h_list, sizeof(ListElem), cudaHostAllocDefault);
  my_list = h_list;
  for (int i = 0; i < NUM_ELE-1; i++){
    my_list->id = i+101;
    my_list->last = false;
    cudaHostAlloc(&(my_list->next), sizeof(ListElem), cudaHostAllocDefault);
    my_list = my_list->next;}
  my_list->last = true;
  test_kernel<<<1,1>>>(h_list);
  cudaDeviceSynchronize();
}

$ nvcc -o t1115 t1115.cu
$ cuda-memcheck ./t1115
========= CUDA-MEMCHECK
List element 0 has id 101
List element 1 has id 102
List element 2 has id 103
List element 3 has id 104
List element 4 is the last item in the list
========= ERROR SUMMARY: 0 errors
$

Note that in the interest of brevity of presentation, I have dispensed with proper CUDA error checking in this example (although running the code with cuda-memcheck demonstrates there are no CUDA run-time errors), but I recommend it any time you are having trouble with a CUDA code. Also note that this example assumes a proper UVA environment.

Community
  • 1
  • 1
Robert Crovella
  • 143,785
  • 11
  • 213
  • 257
  • Sorry, I have some questions. You mean that pointer based structure in pinned memory(Unified Virtual Memory) can be correctly access in GPU? – weichen Mar 21 '16 at 13:30
  • Yes, if the allocations are done correctly, a pointer based structure in pinned memory can be correctly accessed on the GPU. In fact, (before Unified Memory) a pointer based tree or linked list structure would have been one of the canonical uses cases for pinned memory accessed via zero-copy. – Robert Crovella Mar 21 '16 at 15:22
  • I still can't figure out where my code is wrong. Or it's just limited in KGPU project. And I can't find any pointer based structure example using pinned memory on google...Most examples are using Unified Memory. – weichen Mar 22 '16 at 07:42
  • Thanks for your example. I just view it and have some conclusion. In KGPU, it uses only once `cudaHostAlloc` to create a big pinned memory size same with GPU memory. And the pinned memory is KGPU memory pool for allocating linux kernel data by its own `kgpu_vmalloc` API. In your case, it seems every nodes should be created by `cudaHostAlloc`. But it's a big overhead to call `cudaHostAlloc` when a node create in linux kernel. So I think the data in KGPU must be arranged as an array for passing to GPU. Is that right (?) – weichen Mar 23 '16 at 02:57
  • Supplement : KGPU uses `cudaHostRegister` to map data to GPU – weichen Mar 23 '16 at 03:37
  • It's certainly better to allocate all your pinned memory at once. My code was just an example to prove that it's possible to write correct code that uses pinned memory for a linked list. If your code is not working right, you have a bug in your code. – Robert Crovella Mar 23 '16 at 04:43
  • Thanks for your help. I'll try it again ! – weichen Mar 23 '16 at 06:28