Async SSBO Readback

Question

When I call GetBufferSubData with my Shader Storage Buffer Object there is typically a 4ms delay. Is it possible for my application to do work during that time?

// start GetBufferSubData
// do client/app/CPU work
// (wait if needed)
// read results from GetBufferSubData

Or otherwise use some sort of API to asynchronously start copying buffer data from the GPU?

I was able to get an async readback working using glMapBufferRange and GL_MAP_PERSISTENT_BIT. However, when running a compute shader (multiple times back to back) on that buffer, this results in a massive performance degradation compared to no persistent mapping.

"*It shows as being extremely performance intensive on my GPU.*" What exactly is "performance intensive"? Persistent mapping is something you do *once*; that's why it's called "persistent". You map the buffer and just use the pointer whenever you need to. — Nicol Bolas, Jun 21 '22 at 04:38
With the persistent mapping, ALL values are being copied back and forth, so there are bandwidth issues. I need to be able to obtain sub ranges only. — Leon Frickenschmidt, Jun 21 '22 at 04:45
"*With the persistent mapping, ALL values are being copied back and forth*" Nonsense. If you map persistently, *nothing* is copied. You have a pointer to the storage in question. That's it. If you copy something, it's because you copied it by reading from the pointer. — Nicol Bolas, Jun 21 '22 at 04:47
Here was my understanding: The SSBO data is on the GPU. Reading any of that data requires copying it into RAM. The mapped part is just a copy of it in RAM (that could even be out of date). The data on the GPU is being modified by a shader. So when does that transfer take place with a persistently mapped buffer? When is that transfer time paid to move from GPU to RAM? — Leon Frickenschmidt, Jun 21 '22 at 05:02
"*The mapped part is just a copy of it in RAM*" That's not what persistent mapping is. When you persistently map a buffer, your pointer is a CPU-addressable pointer to the GPU's memory for that storage. Any reading happens when you dereference that pointer, just like it would for any other CPU read of memory. — Nicol Bolas, Jun 21 '22 at 13:24

score 0 · Accepted Answer · answered Jun 25 '22 at 12:47

The issue with simply marking the buffer with GL_MAP_PERSISTENT_BIT was that this resulted in a substantial performance degradation (8x slower) when running a compute shader on that buffer (profiled using Nvidia Nsight Graphics). I suspect this is because of the mapping, OpenGL needs to read/write the buffer into a different location that is less performant on the GPU, but more performant/accessible by the CPU.

My solution was to create a much smaller buffer (1000x smaller, 16kb) that is persistently mapped that the CPU can use to read/write to the larger buffer in small increments when needed. This combination was much faster with only a minor API overhead that achieved my needs.

Async SSBO Readback

1 Answers1