How taxing are OpenGL glDrawElements() calls compared to basic logic code?

Question

I'm planning to do some optimization on my OpenGL program (it doesn't need optimizing, but I'm doing it for the sake of it). Out of curiosity, how expensive are OpenGL drawing functions compared to basic logic code? At the moment, I'm making the start of a game where the screen is filled with squares, to represent a 2D blocky landscape. This means that the draw call for a square(two triangles) is called many times. At the moment, I'm planning to add in some code that looks at the positioning of blocks in the current frame, and groups them together. For example, if there is a column that is 7 blocks high, instead of doing 7 separate drawBlock() functions (which contain the glDrawElements() calls) I could call one function, that draws a rectangle that is 1 x 7, and so on, throughout the screen.

I won't bother doing this if the code that calculates what to draw, actually uses up more of the CPU than just drawing the blocks individually would.

I'm not sure how your question relates to `glDrawElements`. It sounds like you're trying to decide whether merging adjacent quads is worth it; in that case, the best way to decide that would be through profiling. — Colonel Thirty Two, Dec 28 '15 at 00:27
@ColonelThirtyTwo: Ordinarily profiling would be in order, but in this case, it's a foregone conclusion. `glDrawElements` is notoriously expensive. — Dietrich Epp, Dec 28 '15 at 00:32
The `glDrawElements()` call is very, very expensive. I recommend putting all of your background blocks into a single VBO and then you can call `glDrawElements()` once for the entire background. Or split it up into chunks, if that's too large… but one quad per `glDrawElements()` is not going to result in very many quads per frame. — Dietrich Epp, Dec 28 '15 at 00:35
I follow everything up to the last sentence "...but one quad per `glDrawElements()` is not going to result in very many quads per frame" Well, that's what I'm doing now, so if it won't result in many quads, do you still mean I should make it more efficient? Adding vertices to the array seems a bit tricked though! Thanks for your help anyway :) — lbowes, Dec 28 '15 at 10:17

Nicol Bolas · Accepted Answer · 2016-05-27T12:49:36.150

The cost of glDrawElements (or any other OpenGL rendering command) cannot really be estimated. This is because its cost depends a great deal on what OpenGL state you changed between draw calls. The cost of calling an OpenGL state changing function (basically, any OpenGL function that isn't a glGet of some form or a glDraw of some form) will be relatively quick. But it will make the next draw call slower.

This video on OpenGL performance shows which state changes are more costly at draw time than others. The really good part starts around 31 minutes in.

Draw calls are relatively fast if you haven't changed any OpenGL state between draw calls. Different pieces of state have different effects on draw calls. From fastest to slowest (according to NVIDIA's presentation above, so take it with a grain of salt):

Non-UBO uniform updates
Vertex buffer bindings (without changing formats)
UBO binding
Vertex format changes
Texture bindings
Fragment post-processing state changes
Shader program changes
Render target switches

Now, a draw call will be more expensive than "basic logic". They're not cheap, even without state changes between them. If efficiency is important to your code, then grouping your squares is a good idea.

score 0 · Answer 2 · answered Dec 28 '15 at 17:40

The actual numbers are highly platform and vendor dependent. Driver architectures on different operating systems vary substantially, and some of them are more efficient than others. On top of that, driver implementations and hardware can cause large performance differences. For example, I've seen 10-20 times higher draw call throughput for one vendor compared to another vendor, on the same platform and with comparable hardware.

Based on this, any numbers below are just a very rough order of magnitude. You really need to measure this yourself on the configurations you care about.

With all these disclaimers, I would expect that a draw call could be processed in the range of 100 instructions (CPU cycles). This is for the case where you just make back to back draw calls, and there are no other bottlenecks in the pipeline.

As @NicolBolas already pointed out, the most expensive part of handling draw calls is normally processing deferred state changes. And most of the time, you will have state changes between draw calls. In this case, for relatively cheap state changes (like binding a texture or buffer, or changing some attributes), a few 100 instructions are typical.

Switching frame buffers is generally quite expensive, and very expensive on some platforms. Other than that, the numbers I measured in the past while optimizing and benchmarking state changes showed an order that is quite different from the list in @NicolBolas' answer. But again, this is highly platform and vendor/hardware dependent.

There are a couple more aspects that makes this somewhat tricky to measure:

Most of the CPU time might not be consumed in your thread. Many drivers are multi-threaded, meaning that most of the work needed to process OpenGL calls is offloaded to a secondary thread. If your application does not use all CPU cores, and you're not throttled by power/thermal limits, this means that a lot of the driver work can happen in parallel, without slowing down your application much. But particularly on mobile devices and laptops, performance is often limited by power consumption, so the driver overhead will still slow you down.
CPU time consumed by the driver is only part of what can slow your application code down. Another consideration is cache pollution. If cache content used by your application is evicted while the OpenGL implementation processes your draw calls, your own code will get more cache misses, and will run slower. So measuring the time spent inside the OpenGL calls only shows part of the picture.

How taxing are OpenGL glDrawElements() calls compared to basic logic code?

2 Answers2

Linked

Related