8

I want to use two PBOs to read pixel in alternative way. I thought the PBO way will much faster, because glReadPixels returns immediately when using PBO, and a lot of time can be overlapped.

Strangely there seems to be not much benefit. Considering some code like:

    glBindBufferARB(GL_PIXEL_PACK_BUFFER_ARB, 0);
    Timer t; t.start();
    glReadPixels(0,0,1024,1024,GL_RGBA, GL_UNSIGNED_BYTE, buf);
    t.stop(); std::cout << t.getElapsedTimeInMilliSec() << " ";

    glBindBufferARB(GL_PIXEL_PACK_BUFFER_ARB, pbo);
    t.start();
    glReadPixels(0,0,1024,1024,GL_RGBA, GL_UNSIGNED_BYTE, 0);
    t.stop(); std::cout << t.getElapsedTimeInMilliSec() << std::endl;

The result is

1.301 1.185
1.294 1.19
1.28 1.191
1.341 1.254
1.327 1.201
1.304 1.19
1.352 1.235

The PBO way is a little faster, but not a satisfactory immediate-return

My question is:

  • What is the factor affecting glReadPixels' performance? Somethimes, the cost of it reaches 10ms, but 1.3ms here.
  • Why immediate-return costs as much as 1.2ms? Is it too big or just normal?

===========================================================================

According to comparison with a demo, I found two factors:

  • GL_BGRA is better than GL_RGBA, 1.3ms=>1.0ms(no PBO), 1.2ms=>0.9ms(with pbo)
  • glutInitDisplayMode(GLUT_RGB|GLUT_ALPHA) rather than GLUT_RGBA, 0.9ms=>0.01ms。That's the performance I want. In my system, GLUT_RGBA=GLUT_RGB=0. GLUT_ALPHA=8

Then another two questions:

  • Why GL_BGRA is better than GL_RGBA? Is it the case for just specific platform or for all platforms?
  • Why GLUT_ALPHA is so important that it affects PBO performance hugely?
Martin Wang
  • 957
  • 14
  • 18
  • ooo just tested this myself on my system GLUT_RGBA 330 fps GLUT_RGB|GLUT_ALPHA 630fps that's an increase by a factor 2 had missed that GLUT_ALPHA was important. – ColacX Sep 22 '13 at 08:27
  • For me on macOS, I had to make sure that my buffer size was a power of two! Otherwise glReadPixels with the PBO would block – Luke Apr 12 '20 at 15:45

2 Answers2

6

I do not know glutInitDisplayMode by heart, but this typically is because your internal and external format do not match. For example, you won't notice the asynchronous behaviour when the number of components do not match because this conversion still blocks the glReadPixels.

So the most likely issue is that with glutInitDisplay(GLUT_RGBA) you will actually create a default framebuffer with an internal format that's actually RGB or even BGR. passing the GLUT_ALPHA parameter is likely to make it RGBA or BGRA internally, which matches the number of components you want.

edit: I found an nvidia document explaining some issues about pixel packing and performance influence.

edit2: The performance gain of BGRA is likely because the internal hw buffer is in BGRA, there's not really much more to it.

KillianDS
  • 16,936
  • 4
  • 61
  • 70
  • Can you get async glReadPixels() if you read DEPTH_COMPONENT instead of RGBA? Because I can't get the glReadPixels return immediately when using PBO. – Bram May 04 '19 at 01:51
3

BGRA is the fastest since this is the native format on modern GPUs. RGBA, RGB and BGR need 'reformatting' during readback.

eile
  • 1,153
  • 6
  • 18