2

I have an OpenCL pipeline that process image/video and it can be greedy with the memory sometimes. It is crashing on cl::Buffer() allocation like this:

cl_int err = CL_SUCCESS;
cl::Buffer tmp = cl::Buffer(m_context, CL_MEM_READ_WRITE, sizeData, NULL, &err);

with the error -4 - cl_mem_object_allocation_failure.

This occurs at a fix point in my pipeline by using very large images. If I just downscale the image a bit, it pass through the pipeline at this very memory intensive part.

I have access to a Nvidia card with 4go that bust at a certain point, and also tried on an AMD GPU with 2go which bust earlier.

According to this thread, there is no need to know the current allocation due to swapping with VRAM, but it seems that my pipeline bust the memory of my device.

So here are my question:

1) Is there any settings on my computer, or pipeline to set to allow more VRAM ?

2) Is it okay to use CL_DEVICE_GLOBAL_MEM_SIZE as reference of the maximum size to allocate, or I need to do CL_DEVICE_GLOBAL_MEM_SIZE - (local memory + private), or something like that ?

According to my own memory profiler, I have 92% of the CL_DEVICE_GLOBAL_MEM_SIZE allocated at the crash. And by resizing a bit, the pipeline says that I used 89% on the resized image and it passed, so I assume that my large image is on the edge to pass.

Vuwox
  • 2,331
  • 18
  • 33
  • 1
    You can use host memory instead i.e. `clCreateBuffer( ... | CL_MEM_USE_HOST_PTR , ..., size, host_ptr, ...);` – Victor Gubin Mar 27 '19 at 17:31
  • @VictorGubin By using Host memory, I need to provide a pointer on the host, but If I want to allocate it only on the device, because I never need it on host (aka host_ptr == NULL) what should I do ? I would like to avoid the transfer as much as possible. – Vuwox Mar 27 '19 at 17:49
  • pointer is a memory address aka size_t (unsigned long or unsigned long long depending on CPU arch ). When you creating a cl buffer using a host ptr - it means GPU will use existing memory ( i.e. RAM) at the address of pointer, and nothing will be copied from RAM to VRAM. Otherwise clCreateBuffer will allocate a memory block in VRAM, and return you a pointer on the memory block allocated. – Victor Gubin Mar 28 '19 at 11:38
  • Yes, I understand all of this. But by allocating using CL_MEM_USE_HOST_PTR, I first need to allocate something on CPU to point onto, and when calling the buffer, I need to wait the bandwitdh for the transfer of that memory on the GPU device, but by allocating without the flag, and NULL pointer, its allocating it directly on the device without any transfer requires, which is super fast, specially when you need memory to reside on the device and never query on the host. But Im just wondering if there is a way to tell when to stop allocate like that, and switch to CL_MEM_USE_HOST_PTR maybe. – Vuwox Mar 28 '19 at 13:17

1 Answers1

0

Some parts of your device's VRAM may be used for the pixel buffer, constant memory, or other uses. For AMD cards, you can set the environment variables GPU_MAX_HEAP_SIZE and GPU_MAX_ALLOC_PERCENT to use a larger part of the VRAM, though this may have unintended side-effects. Both are expressed as percentages of your physically available memory on the card. Additionally, there is a limit on the size for each memory allocation. You can get the maximum size for a single memory allocation by querying CL_DEVICE_MAX_MEM_ALLOC_SIZE, which may be less than CL_DEVICE_GLOBAL_MEM_SIZE. For AMD cards, this size can be controlled with GPU_SINGLE_ALLOC_PERCENT. This requires no changes to your code, simply set the variables before you call your executable:

GPU_MAX_ALLOC_PERCENT="100"
GPU_MAX_HEAP_SIZE="100"
GPU_SINGLE_ALLOC_PERCENT="100"
./your_program
Jan-Gerd
  • 1,261
  • 8
  • 8