I'm doing adjustments to images with OpenGL, so I need this kind of workflow:
- Upload image data to graphics card
- Transform image
- Download result back to main memory
Each step is going to stall until the previous step is finished. This is fine. I need to finish all of these steps as quickly as possible. Multiplexing with other operations would not be an improvement for me; I need to finish this image as quickly as possible.
Now, 2 is really quick, and 3 is not so bad, most likely because the result is a thumbnail of the original image -- vastly smaller.
1 is my bottleneck. I measure uploading 20MB of image data in 1.2 seconds. This puts me at something like 16MB/s. Elsewhere on the Internet I read about people expecting 5.5GB/s, and being disappointed by 2.5GB/s.
It doesn't matter if I use glTexImage2D
directly or do it via a PBO. I have tried both, and measured no difference. This makes sense, since I'm not multiplexing with anything. For my pipeline, I am unable to use PBO without stalling immediately anyway.
The remaining explanation I can think of is this: My system is just this slow. My graphics card is an NVIDIA GPU GeForce GTX 285 (GT200), which is attached via 16x PCI-Express. Is my measured 16MB/s as fast as this is going to get, or have I overlooked something? Does there exist a utility (for Ubuntu/Linux in general) that lets me measure the maximum data rate?
I don't feel comfortable concluding that the system is this slow; after all, my network interface is enormously faster (1Gb/s ~ 125MB/s) and has only cat-5e cable to achieve this on.
Further details: The glTexImage2D
case is pretty straightforward:
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, image.width, image.height, 0, GL_RGBA, GL_UNSIGNED_BYTE, rawData);
Timing this line alone measures ~1200ms.
I have also translated it to use PBO, as mentioned:
GLuint pbo = 0;
glGenBuffers(1, &pbo);
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, pbo);
glBufferData(GL_PIXEL_UNPACK_BUFFER, data_size, pixels, GL_STREAM_DRAW);
glTexImage2D(target, level, internalformat, width, height, border, format, type, 0);
glBindBuffer(GL_PIXEL_UNPACK_BUFFER_ARB, 0);
glDeleteBuffers(1, &pbo);
I also tried with memorymapping:
glBufferData(GL_PIXEL_UNPACK_BUFFER, data_size, 0, GL_STREAM_DRAW);
GLubyte* ptr = (GLubyte*)glMapBufferARB(GL_PIXEL_UNPACK_BUFFER, GL_WRITE_ONLY);
std::copy(pixels, pixels+data_size, ptr);
glUnmapBuffer(GL_PIXEL_UNPACK_BUFFER);
No appreciable difference in timing between any of the solutions.
What kind of data rates should I expect when uploading texture data?
Is 16MB/s reasonable with my setup? (I feel "no". Please tell me if it is!)
Is there a tool I can use to verify that this is the speed of my system, thereby vindicating my code, or alternatively definitely placing the blame on my code?