How does interleaved vertex submission help performance?

Question

I have read and seen other questions that all generally point to the suggestion to interleav vertex positions and colors, etc into one array, as this minimizes the data that gets sent from cpu to gpu.

What I'm not clear on is how OpenGL does this when, even with an interleaved array, you must still make separate GL calls for position and color pointers. If both pointers use the same array, just set to start at different points in that array, does the draw call not copy the array twice since it was the object of two different pointers?

Related: [Performance gain using interleaved arrays](http://stackoverflow.com/questions/14874914/performance-gain-using-interleaved-attribute-arrays-in-opengl4-0) — legends2k, Feb 11 '14 at 05:00

h4lc0n · Accepted Answer · 2013-01-27T19:11:15.973

7

This is mostly about cache. For example, imagine we have 4 vertex and 4 colors. You can provide the information this way (excuse me but I don't remember the exact function names)

glVertexPointer(..., vertex);
glColorPointer(..., colors);

What it internally does, is read vertex[0], then apply colors[0], then again vertex[1] with colors[1]. As you can see, if vertex is, for example, 20 megs long, vertex[0] and colors[0] will be, to say the least, 20 megabytes apart from each other.

Now, on the other hand, if you provide a structure like { vertex0, color0, vertex1, color1, etc.} there will be a lot of cache hits because, well, vertex0 and color0 are together, and so are vertex1 and color1.

Hope this helps answer the question

edit: on second read, I may not have answered the question. You might probably be wondering how does OpenGL know which values to read from that structure, maybe? Like I said before with a structure such as { vertex, color, vertex, color } you tell OpenGL that vertex is at position 0, with an offset of 2 (so next one will be at position 2, then 4, etc) and color starts at position 1, with an offset of 2 also (so position 1, then 3, etc).

addition: In case you want a more practical example, look at this link http://www.lwjgl.org/wiki/index.php?title=Using_Vertex_Buffer_Objects_(VBO). You can see there how it only provides the buffer once and then uses offsets to render efficiently.

edited Jan 27 '13 at 19:11

answered Jan 26 '13 at 09:13

h4lc0n

2,730
5
29
41

Ah ok so it is a cache issue outside the actual opengl API. I was hoping the draw calls were smart enough to realize that the pointers were looking at the same array and opengl would only try to pass the array once rather than it continuing to pass it from CPU to gpu twice, even if that second time benefits from caching. – johnbakers Jan 26 '13 at 14:39
it does, I just don't remember the details to give you the exact functions used. I believe it was glBufferData. You provide only one array, and then glVertexPointer/glColorPointer/etc only tell initial offsets and strides. So, you provide one single array and everything works together with that single array in memory – h4lc0n Jan 26 '13 at 15:01
@SebbyJohanns Yes, it seems I do remember something after all: "If a non-zero named buffer object is bound to the GL_ARRAY_BUFFER target (see glBindBuffer) while a vertex array is specified, pointer is treated as a byte offset into the buffer object's data store.". Check at http://www.opengl.org/sdk/docs/man2/xhtml/glVertexPointer.xml – h4lc0n Jan 26 '13 at 15:25
@SebbyJohanns added more info for a more practical approach – h4lc0n Jan 27 '13 at 19:11

score 4 · Answer 2 · answered Jan 26 '13 at 12:08

I suggest reading: Vertex_Specification_Best_Practices

h4lc0n provided quite nice explanation, but I would like add some additional info:

interleaved data can actually hurt performance when your data often changes. For instance when you change position of point sprites, you update POS, but COLOR and TEXCOORD are usually the same. Then, when data is interleaved you must "touch" additional data. In that case it would be better to have one VBO for POS only (or in general for data that changes often) and the second VBO for data that is constant.
it is not easy to give strict rules about VBO layout, since it is very vendor/driver specific. Also your usage can be different from others. In general it is needed to make some benchmarks for your particular test cases

agreed, what I said before mainly stands for quick cache-friendly reading — h4lc0n, Jan 26 '13 at 12:11

score 0 · Answer 3 · answered Jan 26 '13 at 12:34

You could also make an argument for separating different attributes. Assuming a GPU does not process one vertex after another but rather a bunch (ex. 16) of them in parallel, you would would get something like this while executing a vertex shader:

read attribute A for all 16 vertices
perform some computations
read attribute B for all 16 vertices
perform some more computations
....

So you read one attribute for many vertices at once. From this reasoning it would seem that interleaving the attributes actually hurts the performance. Of cours this would only be visible if you are either bandwidth constrained or if the memory latency cannot be hidden for some reason (ex. a complex shader that requires many registers will reduce the number of vertices that can be in flight at a given time).

How does interleaved vertex submission help performance?

3 Answers3

Linked