5

I am coding my own rendering engine. Currently I am working on terrain. I render the terrain using glDrawArraysInstanced. The terrain is made out of a lot of "chunks". Every chunk is one quad which is also one instance of the draw call. Each quad is then tessellated in tessellation shaders. For my shader inputs I use VBOs, instanced VBOs (using vertex attribute divisor) and texture buffers. This is a simple example of one of my shaders:

#version 410 core

layout (location = 0) in vec3 perVertexVector; // VBO attribute  
layout (location = 1) in vec3 perInstanceVector; // VBO instanced attribute
uniform samplerBuffer someTextureBuffer; // texture buffer
out vec3 outputVector;

void main()
{
    // some processing of the inputs;
    outputVector = something...whatever...;
} 

Everything works fine and I got no errors. It renders at around 60-70 FPS. But today I was changing the code a bit and I had to change all the instanced VBOs to texture buffers. For some reason the performance doubled and it runs at 120-160 FPS! (sometimes even more!) I didn't change anything else, I just created more texture buffers and used them instead of all instanced attributes.

This was my code for creating instanced attribute fot the shader (simplified to readable version):

glBindBuffer(GL_ARRAY_BUFFER, VBO);
glBufferData(GL_ARRAY_BUFFER, size, buffer, GL_DYNAMIC_DRAW);
glVertexAttribPointer(0, 3, GL_FLOAT, GL_FALSE, 3 * sizeof(GLfloat), (GLvoid*)0);
glEnableVertexAttribArray(0);
glVertexAttribDivisor(0, 1); // this makes the buffer instanced

This is my simplified code for creating texture buffer:

glBindTexture(GL_TEXTURE_BUFFER, textureVBO);
glTexBuffer(GL_TEXTURE_BUFFER, GL_RGB32F, VBO);

I don't think I am doing anything wrong because everything works correctly. It's just the performance... I would assume that attributes are faster then textures but I got the opposite result and I am really surprised by the fact that texture buffers are more than two times faster than attributes.

But there is one more thing that I don't understand.

I actually call the render function for the terrain (glDrawArraysInstanced) two times. The first time is to render the terrain and the second time is to render it to the FBO with different transformation matrix for water reflection. When I render it only once (without the reflection) and I use the instanced attributes I get around 90 FPS so that is a bit faster than 60 FPS which I mentioned earlier.

BUT! when I render it only once and I use the texture buffers the difference is really small. It runs just as fast as when I render it two times (around 120-150 fps)!

I am wondering if it uses some kind of caching or something but it doesn't make any sense for me because the vertices are transformed with different matrices each of the two render calls so the shaders output totally different results.


I would really appreciate some explanation for this question:

Why is the texture buffer faster than the instanced attributes?


EDIT:

Here is a summary of my question for better understanding:

The only thing I do is that I change these lines in my glsl code:

layout (location = 1) in vec3 perInstanceVector; // VBO instanced attribute
outputVector = perInstanceVector;

to this:

uniform samplerBuffer textureBuffer; // texture buffer which has the same data as the previous VBO instanced attribute
outputVector = texelFetch(textureBuffer, gl_InstanceID).xyz

Everything works exactly as before but it is twice as fast in terms of performance.

MarGenDo
  • 727
  • 1
  • 8
  • 17
  • 2
    Why does your shader claim to take a `vec3` for the instanced array attribute, but your actual OpenGL code only passes a single float per instance? Also, why are you performing instancing on *a single quad*? – Nicol Bolas Jun 13 '16 at 19:18
  • The shader code I posted here was just an example, it doesn't actually look like this. I simplified it to be easier to read. It is correct in my code. I just made a typo here. I edited it, so that it matches. And I am rendering a lot of instances of the quad. – MarGenDo Jun 13 '16 at 19:36
  • I added a short summary of my question so it should be easier to understand now. – MarGenDo Jun 14 '16 at 16:41
  • 1
    I think it has to do with pre-t&l cache, here's some infos: http://stackoverflow.com/questions/29623938/cache-friendly-vertex-definition/29624130#29624130 and http://stackoverflow.com/questions/37428688/is-instancing-faster-on-gpu#comment62366951_37428688 – j-p Jun 15 '16 at 03:43
  • Yeah, I am sure it has to use some caching. But why does it make such a big performance gain when I use texture buffer instead of the attributes. Shouldn't it use the caching for both of the options? – MarGenDo Jun 15 '16 at 08:45
  • This will be GPU / driver dependent. I'm curious, which GPU / OS are you using? – James Bedford Jul 09 '16 at 11:47
  • I am running on Windows 10 Home, 64-bit, NVIDIA GeForce GT 520MX – MarGenDo Jul 09 '16 at 17:46

1 Answers1

0

I see 3 possible reason :

  1. The shaders could have a different occupancy as the register are used differently therefore performance will be quite different
  2. Between attribute the fetching is not achieved in the same way and scheduler could do a better wait handling in the Shaders than in the input assembler
  3. Maybe there is less driver overhead with the second one

Did you tried with different amount of primitive? Or tried to use timers?