I was trying to make ffmpeg decode and transform pixels into rgb8 format and write into a mapped pixel buffer and use streaming to update opengl texture, which is then rendered to a sdl window.
The decoding and uploading happens in a dedicated thread(make sws_scale writes to the mapped buffer), and the rendering is done in a the render thread in another context with sharing. (The PBO actually holds several frames, and the texture is a 2d array texture, to decouple the positions.)
Things works fine if I flush the mapped range in the decoding thread, and use glTextureSubImage3D in the render thread to update the texture at needed index. The integrated Intel gpu works pretty fast (it should) in this scenario, but the NV driver complains about Pixel-path performance warning: Pixel transfer is synchronized with 3D rendering.
I thought that might be that only glTextureSubImage3D actually does the upload, so I moved the glTextureSubImage3D right after the flush operation. This time the NV gpu works fine, and the warning disappears, whereas the intel gpu gives a black window, and only shows decoded content on closing.
The code is something like this:
//render thread
void RenderFrame(SDL_Window* window,GLobjects& glo, int index, int width, int height) {
glUniform1f(glo.index_location,index);
//The function in question
glTextureSubImage3D(glo.texture, 0, 0, 0, index, width, height, 1, GL_RGBA, GL_UNSIGNED_BYTE, (void*)(index * width * height * 4));
glClear(GL_COLOR_BUFFER_BIT);
glDrawArrays(GL_TRIANGLE_STRIP, 0, 4);
SDL_GL_SwapWindow(window);
}
//Decode thread
int DecodeFrameToPBO(GLobjects& glo, int index){
//fill the mapped range needed
glFlushMappedBufferRange(GL_PIXEL_UNPACK_BUFFER, index * width * height * 4, 4 * width * height);
//The function in question
//glTextureSubImage3D(glo.texture, 0, 0, 0, index, width, height, 1, GL_RGBA, GL_UNSIGNED_BYTE, (void*)(index * width * height * 4));
}
I'm really confused by the idea of client-side memory and how the driver asynchronously uploads the texture, where exactly is the upload is supposed to happen and what glTextureSubImage3D actually does when a GL_PIXEL_UNPACK_BUFFER is bound?
EDIT:
After adding a glFlush() command to flush the upload context's command queue after each upload, the intel version works properly without black screen.
UPDATE:
Adding the glFlush() seems to make the NV gpu emit warning 'Pixel-path performance warning: Pixel transfer is synchronized with 3D rendering.' again, and same video sample's GPU utilization grew from 8% to 10%. It seems the glFlush() triggers some internal synchronization that perhaps make things go under busy wait? Since the Intel GPU cannot work without the glFlush, even with the clientwaitsync version with the flush command bit set, and calling flush explicitly on the render side also does not work. So what should be done in order to make both driver happy (and reduce utilization)?