2

I'm trying to make an array of vec3 available to a fragment shader. In the targeted application, there could be several hundred elements.

I tested transferring data in the form of a shader storage buffer object, declared as

layout(binding = 0) buffer voxels { vec3 xyz[]; }

and set using glBufferData, but I found that my fragment shader becomes very slow, even with only 33 elements.

Moreover, when I convert the same data into the GLSL code of a const vec3[] and include it in the shader code, the shader becomes noticeably faster.

Is there a better way – faster than an SSBO and more elegant than creating shader code?

As might already be apparent from the above, the array is only read from in the shader. It is constant within the shader as well as over shader invocations for different fragments, so effectively a uniform, and it is set only once or a few times over the runtime of the program.

A. Donda
  • 8,381
  • 2
  • 20
  • 49
  • 2
    What is your fragment shader doing and how did you determine that it was slow? Also, are you [using `vec3` correctly](https://stackoverflow.com/q/38172696/734069)? – Nicol Bolas May 22 '19 at 02:04
  • @NicolBolas Slow: By comparison of the frame rate with the case in which no data are transferred, and the case in which it is hardcoded, as described. Correctly: I tried different layout specifications, and they appear to make no difference. I always have to pad the `vec3` to 4 elements when sending it to the SSBO, otherwise they do not arrive correctly. Declaring them as `vec4` doesn't make a difference w.r.t. frame rate. – A. Donda May 22 '19 at 04:07

1 Answers1

3

I'd recommend using std430 layout specifier on the SSBO given that you are using vec3 data types, otherwise you'll be forced to pad the data, which isn't going to be great. In general, if the buffer is a fixed size, then prefer using glBufferSubData instead of glBufferData (the latter may reallocate memory on the GPU).

As yet another alternative, if you are able to target GL 4.4+, consider using glBufferStorage instead (or even better, if GL4.5 is available, use glCreateuffers, and glNamedBufferStorage). This let's you pass a few more hints to the GL driver about the way in which the buffer will be consumed. I'd try out a few options (e.g. mapping v.s. sub-data v.s. recreating each time).

robthebloke
  • 9,331
  • 9
  • 12
  • 1
    Thanks for the pointers. Regarding the layout: I found that no matter what layout specifier I use, including `std430`, I always have to pad the data to 4 floats, otherwise they arrive corrupted. I don't understand it, it can only guess it is a bug in the Nvidia driver (GeForce GTX 1060 6GB under Linux). – A. Donda May 22 '19 at 17:53
  • Regarding `glBufferSubData`: Since `glBufferData` is only called once or a few times during runtime, this cannot account for the speed, or can it? Or would this influence data access speed in the shader? Same for `glBufferStorage`. – A. Donda May 22 '19 at 18:01
  • Regarding hints: I can't say I understand the flags passed to `glBufferStorage`. Do you have a tip which flag(s) specifically might improve data access speed in the shader? – A. Donda May 22 '19 at 18:06
  • 1
    Ahh yes. Sorry, was being a muppet. Load the voxels as floats (using std430), and construct a vec3 from 3 floats. That *will* work (my bad!). If you are only calling glBufferData once, then the best flag to use for glBufferStorage is zero (i.e. tell the driver you have no intention of ever updating it). If you need to update the data, then you're basically going to use either: GL_MAP_WRITE_BIT (and update by mapping), or GL_DYNAMIC_STORAGE_BIT (and update via glBufferSubData). Which is faster, you'll have to profile yourself (depends on the hardware & driver) – robthebloke May 23 '19 at 04:21
  • 1
    If the performance is still too slow, one option is to specify the voxels as a texture, and use texelFetch to drag the values. SSBO's can have a performance hit due to their atomic update nature. – robthebloke May 23 '19 at 04:23
  • 1
    It might also be worth enabling a debug context, and see if there is anything entertaining being sent to glDebugMessageCallback. There might be some random warnings being thrown that might be causing the performance issue. – robthebloke May 23 '19 at 04:25