It's pretty common for images that are used together with high speed image processing algorithms to have a padding at the end of each line so that the image has a pixel or byte size that is a multiple of 4, 8, 16, 32, and so on. This makes it much easier to optimize certain algorithms for speed especially in combination with SIMD instruction sets like SSE on x86 or NEON on ARM. In your case the padding is 12 pixels, which means Apple seems to optimize their algorithms for processing 32 pixels per line; 852 is not dividable by 32, 864 is, hence the lines are padded by 12 pixels to maintain a 32 pixel alignment. The correct technical terms are size and stride size or in case of images, width and stride width. The width is the amount of actual pixel data per line, the stride width is the real line size, including pixel data and optional padding at the end of the line.
Standard OpenGL allows to load textures with a stride width bigger than the actual texture width. This is achieved by setting the glPixelStore
parameter GL_PACK_ROW_LENGTH
accordingly. Note that this "stride padding skip" is usually implemented within the CPU part of the driver, so this no operation performed on the GPU, in fact the driver will removing the extra padding before uploading data to the GPU. As OpenGL ES is designed to run on embedded devices which may have very limited CPU resources available, this option was removed from OpenGL ES to keep driver development simple, even for very weak embedded CPUs. This leaves you with four options to deal with your problem:
Preprocess the texture to remove the padding using a C copy loop, that skips the extra pixels at the end of each line. This implementation is rather slow but easy to implement.
Preprocess the texture as in case of option (1), however use the compiler SIMD macros to make use of NEON instructions. This will be about 2 times faster than option (1) but it's also harder to implement and you'll need some knowledge about NEON instructions and how to use them to achieve this goal.
Preprocess the textures as in case of option (2), however use a pure assembly implementation. This will be about 3 times faster than option (2), so about 6 times faster than option (1) but it's also a lot harder to implement, since you'll need knowledge about ARM assembly programming + NEON instructions.
Load the texture with padding and adjust the texture coordinates for OpenGL to make it ignore the padding pixels. Depending on how complex your texture mapping is, this might be very easy to implement, it's faster than any other option above and the only downside is that you waste a little bit more texture memory on the GPU.
I know very little about ARM assembly programming and even less about NEON instructions, so I cannot really help you with options (2) and (3). I could show you an implementation for option (1), however, I'm afraid it might be too slow for your purpose. This only leaves the last option which I have been using myself plenty of times in the past.
We declare 3 variables: width, height, and stride width.
GLsizei width = 852;
GLsizei height = 640;
GLsizei strideWidth = 864;
When you load the texture data (assuming rawData points to the raw image bytes), you pretend the strideWidth to be the "real width":
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGB,
strideWidth, height, 0, GL_RGB, GL_UNSIGNED_BYTE, rawData);
Texture coordinates in OpenGL are normalized, that means the lower left corner is always (0.0f, 0.0f)
and the upper right corner is always (1.0f, 1.0f)
, regardless what pixel size the texture really has. These two values could be called (x, y)
, but to not confuse them with vertex coordinates, they are called (s, t)
instead.
To make OpenGL cut off the padding pixels, you just need to adjust all s-coordinates by a certain factor, let's call it SPCF
(Stride Padding Cut Factor), which you calculate the following way:
float SPCF = (float)width / strideWidth;
So instead of the texture coordinate (0.35f, 0.6f)
, you would use (0.35f * SPCF, 0.6f)
. Of course you shouldn't perform this calculation once per rendered frame. Instead you should copy the original texture coordinates, adjust all s-coordinates once by SPCF and then use these adjusted coordinates when rendering frames. If you ever reload the texture in the future and SPCF has changed, repeat the adjustment process. In case width equals strideWidth, this algorithm works as well, as in that case SPCF is 1.0f and thus won't alter the s-coordinates at all, which would be correct, since there is no padding to cut off.
The downside of this trick is that the texture will need 2.4% more memory in your case than would otherwise be necessary, which also means that the texture upload via glTexImage2D
will be 2.4% slower. I guess that is acceptable and still much faster than any of the other CPU intensive options above.