Is there an efficient way to exceed GL_MAX_VIEWPORTS?

Question

I am currently implementing the pose estimation algorithm proposed in Oikonomidis et al., 2011, which involves rendering a mesh in N different hypothesised poses (N will probably be about 64). Section 2.5 suggests speeding up the computation by using instancing to generate multiple renderings simultaneously (after which they reduce each rendering to a single number on the GPU), and from their description, it sounds like they found a way to produce N renderings simultaneously.

In my implementation's setup phase, I use an OpenGL viewport array to define GL_MAX_VIEWPORTS viewports. Then in the rendering phase, I transfer an array of GL_MAX_VIEWPORTS model-pose matrices to a mat4 uniform array in GPU memory (I am only interested in estimating position and orientation), and use gl_InvocationID in my geometry shader to select the appropriate pose matrix and viewport for each polygon of the mesh.

GL_MAX_VIEWPORTS is 16 on my machine (I have a GeForce GTX Titan), so this method will allow me to render up to 16 hypotheses at a time on the GPU. This may turn out to be fast enough, but I am nonetheless curious about the following:

Is there is a workaround for the GL_MAX_VIEWPORTS limitation that is likely to be faster than calling my render function ceil(double(N)/GL_MX_VIEWPORTS) times?

I only started learning the shader-based approach to OpenGL a couple of weeks ago, so I don't yet know all the tricks. I initially thought of replacing my use of the built-in viewport support with a combination of:

a geometry shader that adds h*gl_InvocationID to the y coordinates of the vertices after perspective projection (where h is the desired viewport height) and passes gl_InvocationID onto the fragment shader; and
a fragment shader that discards fragments with y coordinates that satisfy y<gl_InvocationID*h || y>=(gl_InvocationID+1)*h.

But I was put off investigating this idea further by the fear that branching and discard would be very detrimental to performance.

The authors of the paper above released a technical report describing some of their GPU acceleration methods, but it's not detailed enough to answer my question. Section 3.2.3 says "During geometry instancing, viewport information is attached to every vertex... A custom pixel shader clips pixels that are outside their pre-defined viewports". This sounds similar to the workaround that I've described above, but they were using Direct3D, so it's not easy to compare what they were able to achieve with that in 2011 to what I can achieve today in OpenGL.

I realise that the only definitive answer to my question is to implement the workaround and measure its performance, but it's currently a low-priority curiosity, and I haven't found answers anywhere else, so I hoped that a more experienced GLSL user might be able to offer their time-saving wisdom.

Do your *viewports* (that is, the parameters to `glViewport`) actually change from "viewport" to "viewport"? From a cursory description of your algorithm, I don't think they do. — Nicol Bolas, Apr 09 '19 at 18:30

Nicol Bolas · Accepted Answer · 2019-04-10T17:29:40.287

8

From a cursory glance at the paper, it seems to me that the actual viewport doesn't change. That is, you're still rendering to the same width/height and X/Y positions, with the same depth range.

What you want is to change which image you're rendering to. Which is what gl_Layer is for; to change which layer within the layered array of images attached to the framebuffer you are rendering to.

So just set the gl_ViewportIndex to 0 for all vertices. Or more specifically, don't set it at all.

The number of GS instancing invocations does not have to be a restriction; that's your choice. GS invocations can write multiple primitives, each to a different layer. So you could have each instance write, for example, 4 primitives, each to 4 separate layers.

Your only limitations should be the number of layers you can use (governed by GL_MAX_ARRAY_TEXTURE_LAYERS and GL_MAX_FRAMEBUFFER_LAYERS, both of which must be at least 2048), and the number of primitives and vertex data that a single GS invocation can emit (which is kind of complicated).

edited Apr 10 '19 at 17:29

answered Apr 09 '19 at 18:38

Nicol Bolas

449,505
63
781
982

1

Thanks for this suggestion! It took a while to figure out how to render into the depth attachment of a texture array, and then read the result to verify that it works (I had to switch from using `glReadPixels` to `glGetTextureImage`). Unfortunately, the limit is nowhere near as high as `GL_MAX_ARRAY_TEXTURE_LAYERS`, as you can only create `MAX_GEOMETRY_SHADER_INVOCATIONS` geometry shader instances, which is 32 on my machine. This is still better than the `GL_MAX_VIEWPORTS` limit though, so I've accepted your answer. – Ose Apr 10 '19 at 17:13
Section 3.2.3 of the technical report, which I referred to in my answer, is called "multi-viewport clipping", so it seems pretty clear that they did render into distinct viewports in parallel. Whether they used the built-in viewport support or simulated it with `discard` is unclear, as is the question of whether or not they really were able to render all of their hypotheses simultaneously. – Ose Apr 10 '19 at 17:26
1

@Ose: GS invocations is not a limitation; see my answer. – Nicol Bolas Apr 10 '19 at 17:29
Oh good point, I'll write multiple primitives! Thanks again for your help! – Ose Apr 10 '19 at 17:32
1

I've encountered another limit: the number of available uniforms (https://www.khronos.org/opengl/wiki/Uniform_(GLSL)#Implementation_limits). I need an array of `N` `mat4`s (one per hypothesis), and an extra `mat4` for the projection matrix, but my machine won't allow me to define more than 128 `mat4` uniforms. This question suggests getting around this with uniform buffer objects: https://stackoverflow.com/q/20647207/1292784 . But I probably won't need more than 127 hypotheses anyway. – Ose Apr 11 '19 at 13:32

Is there an efficient way to exceed GL_MAX_VIEWPORTS?

1 Answers1