How can I optimize the rendering of a large model in OpenGL ES 1.1?

Question

I just finished implementing VBO's in my 3D app and saw a roughly 5-10x speed increase in rendering. What used to render at 1-2 frames per second now renders at 10-11 frames per second.

My question is, are there any further improvements I can make to increase rendering speed? Will triangle strips make a big difference? Currently vertices are not being shared between faces, each faces vertices are unique but overlapping.

My Device Utilization is 100%, Tiler Utilization is 100%, Renderer Utilization is 11%, and resource bytes is 114819072. This is rendering 912,120 faces on a CAD model.

Any suggestions?

CAD Model

score 10 · Accepted Answer · edited May 23 '17 at 12:33

10

A Tiler Utilization of 100% indicates that your bottleneck is in the size of the geometry being sent to the GPU. Whatever you can do to shrink the geometry size can lead to an almost linear reduction in rendering time, in my experience. These tuning steps have worked for me in the past:

If you're not already, you could look at using indexing, which might cut down on geometry by eliminating some redundant vertices. The PowerVR GPUs in the iOS devices are optimized for using indexed geometry, as well.
Try using a smaller data type for your vertex information. I found that I could use GLshort instead of GLfloat for my vertices and normals without losing much precision in the rendering. This will significantly compact your geometry and lead to a nice speed boost in rendering.
Bin similarly colored vertices and render them as one group at a set color, rather than supplying per-vertex color information. The overhead from the few extra draw calls this requires will be vastly outweighed by the speedup you get from not having to send all that color information. I saw a ~18% reduction in rendering time by binning the colors in one of my larger models.
You're already using VBOs, so you've taken advantage of that optimization.
Don't halt the rendering pipeline at any point. Cut out anything that reads the current state, like all glGet* calls, because they really mess with the flow of the PowerVR GPUs.

There are other things you can do that will lead to smaller performance improvements, like using interleaved vertex, normal, texture data in your VBOs, aligning your data to 4 byte boundaries, etc., but the ones above are what I've found to have the largest impact in the tuning of my own OpenGL ES 1.1 application.

Most of these points are covered well in the "Best Practices for Working with Vertex Data" section of Apple's OpenGL ES Programming Guide for iOS.

edited May 23 '17 at 12:33

Community

1
1

answered Apr 19 '11 at 18:44

Brad Larson

170,088
45
397
571

How does indexed geometry (bullet point 1) work with interleaved vertex arrays? Apple says that interleaved arrays are the most efficient, but I can't figure out how it would work with indexing. Currently I'm using glDrawArrays(). Can you post a quick code sample of how to render using VBO's with glDrawElements? Thanks. – Davido Apr 20 '11 at 16:19
@Davido - Put per-vertex information in the VBOs as groupings (vertex, normal, texture coordinate, vertex, normal, texture coordinate...). Each index refers to a vertex, so for each index the corresponding vertex, normal, and texture coordinate will be pulled in when drawing. Grab the code to Molecules, where you can see an example of interleaved indexed drawing: http://sunsetlakesoftware.com/molecules . Unfortunately, I don't yet have the version of the application on there which also does color-based binning, but I will soon. – Brad Larson Apr 20 '11 at 16:26
Ahhh, after much searching I now remember why I used glDrawArrays as opposed to glDrawElements in the first place. glDrawElements does not support GL_UNSIGNED_INT, which effectively limits it to rendering no more than 65536 vertices for a single index array. Problem is, my model that has a million faces has about 760,000 indices in the index array. Using glDrawElements would greatly increase the complexity of the code by being limited to 65536 indices per array. Any ideas on this one, or am I stuck with glDrawArrays? – Davido Apr 20 '11 at 17:15
@Davido - In my case, I use multiple VBOs for storing geometry with more than 65536 indices. It's not that hard to construct an array of them, and segment your model appropriately. I still suspect that the buffer switching and drawing call overhead would be outweighed by the faster rendering from the reduced geometry size. You could do a quick calculation to see what your size is with and without indexing to see how much you could gain be going this way. – Brad Larson Apr 20 '11 at 17:30
The theoretical limit that I should be able to achieve with 920,000 faces is 9,807,360 (based on my own run of GLBenchmark) polygons per second (iPad2), divided by my number of faces, which is 10.66 frames per second, correct? So if I'm currently getting my frame rate switching back and forth between 10 and 11 fps in instruments, how will further optimizations affect my app performance if I'm already at the limit? Just curious. – Davido Apr 21 '11 at 17:13
@Davido - Interesting that you're that close to the GLBenchmark limit. I've only gotten to 80-90% of that within my own application, even after the above optimizations. Still, that's just a benchmark and it may have flaws or limitations on this hardware, or even not be expressing the exact scene conditions you're rendering under. I'd be very surprised if you didn't at least see a performance boost from shifting your numerical representation from GLfloat to GLshort. The worst that happens is that you don't improve upon your current framerate, so you know you're at the max. – Brad Larson Apr 21 '11 at 17:23
Good suggestion. I've seen a lot of references to using GLshort instead of GLfloat and scaling your values, and I know Molecules uses GLshort for rendering, but could you outline the process for converting floats to shorts for rendering? It always helps to see the big picture for new concepts. – Davido Apr 21 '11 at 18:28
@Davido - Basically, you change your coordinate space from -1.0 .. 1.0 to -32767 .. 32767. I do this by having wrapper methods for adding the vertices, normals, etc. to the VBO so that you don't have to change any of your upstream processing, just that last-second conversion to GLshort when adding to the final VBO data. You also need to adjust `glOrtho()`, but that's it. You can just re-use what I have in Molecules for doing this, with very little customization for your application. It only changes a handful of lines of code. – Brad Larson Apr 21 '11 at 18:37
I have it all set up to render shorts, but the program crashes when I run it, only difference being I create vector instead of vector, any ideas? http://stackoverflow.com/questions/5749735/error-switching-from-vectorfloat-to-vectorshort – Davido Apr 21 '11 at 20:59
I got the code working, but it runs at 3-4 fps as opposed to the 10-11 fps I was getting by using floats. The only thing I can figure is my vertex components aren't ending on a multiple of 4 byte boundaries, they are ending on 6 byte boundaries. Could this really affect the performance that much? If so, do I just add an extra short to my file when I'm writing it, then change my stride values, or is there something else I need to change as well? – Davido Apr 22 '11 at 15:21
OK, I did some experimenting and yes, 4 byte boundaries do have a huge impact on performance. However, even rendering on 4 byte boundaries, the only advantage to using Shorts was my model size was significantly smaller. Floats still rendered faster at between 10-11 fps, while shorts rendered between 9-10 fps. Any idea why? I do have an extra glPushMatrix(), glPopMatrix(), glTranslatef() and glScalef() call for each rendered frame when using shorts. – Davido Apr 22 '11 at 15:44

How can I optimize the rendering of a large model in OpenGL ES 1.1?

1 Answers1

Linked