My goal is to read the contents of the default OpenGL framebuffer and store the pixel data in a cv::Mat
. Apparently there are two different ways of achieving this:
1) Synchronous: use FBO and glRealPixels
cv::Mat a = cv::Mat::zeros(cv::Size(1920, 1080), CV_8UC3);
glReadPixels(0, 0, 1920, 1080, GL_BGR, GL_UNSIGNED_BYTE, a.data);
2) Asynchronous: use PBO and glReadPixels
cv::Mat b = cv::Mat::zeros(cv::Size(1920, 1080), CV_8UC3);
glBindBuffer(GL_PIXEL_PACK_BUFFER, pbo_userImage);
glReadPixels(0, 0, 1920, 1080, GL_BGR, GL_UNSIGNED_BYTE, 0);
unsigned char* ptr = static_cast<unsigned char*>(glMapBuffer(GL_PIXEL_PACK_BUFFER, GL_READ_ONLY));
std::copy(ptr, ptr + 1920 * 1080 * 3 * sizeof(unsigned char), b.data);
glUnmapBuffer(GL_PIXEL_PACK_BUFFER);
glBindBuffer(GL_PIXEL_PACK_BUFFER, 0);
From all the information I collected on this topic, the asynchronous version 2) should be much faster. However, comparing the elapsed time for both versions yields that the differences are often times minimal, and sometimes version 1) events outperforms the PBO variant.
For performance checks, I've inserted the following code (based on this answer):
std::chrono::steady_clock::time_point begin = std::chrono::steady_clock::now();
....
std::chrono::steady_clock::time_point end = std::chrono::steady_clock::now();
std::cout << "Time difference = " << std::chrono::duration_cast<std::chrono::microseconds>(end - begin).count() << std::endl;
I've also experimented with the usage hint when creating the PBO: I didn't find much of difference between GL_DYNAMIC_COPY
and GL_STREAM_READ
here.
I'd be happy for suggestions how to increase the speed of this pixel read operation from the framebuffer even further.