1

I was trying to make ffmpeg decode and transform pixels into rgb8 format and write into a mapped pixel buffer and use streaming to update opengl texture, which is then rendered to a sdl window.

The decoding and uploading happens in a dedicated thread(make sws_scale writes to the mapped buffer), and the rendering is done in a the render thread in another context with sharing. (The PBO actually holds several frames, and the texture is a 2d array texture, to decouple the positions.)

Things works fine if I flush the mapped range in the decoding thread, and use glTextureSubImage3D in the render thread to update the texture at needed index. The integrated Intel gpu works pretty fast (it should) in this scenario, but the NV driver complains about Pixel-path performance warning: Pixel transfer is synchronized with 3D rendering.

I thought that might be that only glTextureSubImage3D actually does the upload, so I moved the glTextureSubImage3D right after the flush operation. This time the NV gpu works fine, and the warning disappears, whereas the intel gpu gives a black window, and only shows decoded content on closing.

The code is something like this:

//render thread
void RenderFrame(SDL_Window* window,GLobjects& glo, int index, int width, int height) {
    glUniform1f(glo.index_location,index);
    //The function in question
    glTextureSubImage3D(glo.texture, 0, 0, 0, index, width, height, 1, GL_RGBA, GL_UNSIGNED_BYTE, (void*)(index * width * height * 4));
    glClear(GL_COLOR_BUFFER_BIT);
    glDrawArrays(GL_TRIANGLE_STRIP, 0, 4);
    SDL_GL_SwapWindow(window);
}

//Decode thread
int DecodeFrameToPBO(GLobjects& glo, int index){
    //fill the mapped range needed
    glFlushMappedBufferRange(GL_PIXEL_UNPACK_BUFFER, index * width * height * 4, 4 * width * height);
    //The function in question
    //glTextureSubImage3D(glo.texture, 0, 0, 0, index, width, height, 1, GL_RGBA, GL_UNSIGNED_BYTE, (void*)(index * width * height * 4));
}

I'm really confused by the idea of client-side memory and how the driver asynchronously uploads the texture, where exactly is the upload is supposed to happen and what glTextureSubImage3D actually does when a GL_PIXEL_UNPACK_BUFFER is bound?

EDIT:

After adding a glFlush() command to flush the upload context's command queue after each upload, the intel version works properly without black screen.

UPDATE:

Adding the glFlush() seems to make the NV gpu emit warning 'Pixel-path performance warning: Pixel transfer is synchronized with 3D rendering.' again, and same video sample's GPU utilization grew from 8% to 10%. It seems the glFlush() triggers some internal synchronization that perhaps make things go under busy wait? Since the Intel GPU cannot work without the glFlush, even with the clientwaitsync version with the flush command bit set, and calling flush explicitly on the render side also does not work. So what should be done in order to make both driver happy (and reduce utilization)?

shangjiaxuan
  • 205
  • 1
  • 10

1 Answers1

2

I think you are mislead by the nvidia warning. It does not imply that there is a CPU-GPU synchronization, it only tells you that the rendering of the texture is synchronized (has to wait for) uploading the texture, which is fine. See this answer for more details.

So my answer is: there is no issue, hence the solution is to not change anyhing.

I thought that might be that only glTextureSubImage3D actually does the upload, so I moved the glTextureSubImage3D right after the flush operation [into the decode thread].

If you do that, you now have to manually synchronize the rendering with the texture upload, or you will encounter half-written frames or even undefined content at times, and basically have a race condition.

You could do such synchronization with OpenGL Sync Objects. But in the end, you would not get more performance than in your original variant.

whereas the intel gpu gives a black window, and only shows decoded content on closing

It is not clear if this is only a result of the missing synchronization, a bug in your code, or even a driver bug.

derhass
  • 43,833
  • 2
  • 57
  • 78
  • It seems it's not a syncing issue with the intel driver. After adding explicit fence after the glTextureSubImage3D in upload thread (fence), and before the draw arrays (wait), running with intel gpu still gives a black screen (or I was doing it wrongly). – shangjiaxuan Feb 13 '20 at 13:49
  • Well, what is going on in your intel case is not debuggable with the information given in the question. – derhass Feb 13 '20 at 14:21