I am currently implementing the pose estimation algorithm proposed in Oikonomidis et al., 2011, which involves rendering a mesh in N
different hypothesised poses (N
will probably be about 64). Section 2.5 suggests speeding up the computation by using instancing to generate multiple renderings simultaneously (after which they reduce each rendering to a single number on the GPU), and from their description, it sounds like they found a way to produce N
renderings simultaneously.
In my implementation's setup phase, I use an OpenGL viewport array to define GL_MAX_VIEWPORTS
viewports. Then in the rendering phase, I transfer an array of GL_MAX_VIEWPORTS
model-pose matrices to a mat4
uniform
array in GPU memory (I am only interested in estimating position and orientation), and use gl_InvocationID
in my geometry shader to select the appropriate pose matrix and viewport for each polygon of the mesh.
GL_MAX_VIEWPORTS
is 16 on my machine (I have a GeForce GTX Titan), so this method will allow me to render up to 16 hypotheses at a time on the GPU. This may turn out to be fast enough, but I am nonetheless curious about the following:
Is there is a workaround for the GL_MAX_VIEWPORTS
limitation that is likely to be faster than calling my render function ceil(double(N)/GL_MX_VIEWPORTS)
times?
I only started learning the shader-based approach to OpenGL a couple of weeks ago, so I don't yet know all the tricks. I initially thought of replacing my use of the built-in viewport support with a combination of:
- a geometry shader that adds
h*gl_InvocationID
to they
coordinates of the vertices after perspective projection (whereh
is the desired viewport height) and passesgl_InvocationID
onto the fragment shader; and - a fragment shader that
discard
s fragments withy
coordinates that satisfyy<gl_InvocationID*h || y>=(gl_InvocationID+1)*h
.
But I was put off investigating this idea further by the fear that branching and discard
would be very detrimental to performance.
The authors of the paper above released a technical report describing some of their GPU acceleration methods, but it's not detailed enough to answer my question. Section 3.2.3 says "During geometry instancing, viewport information is attached to every vertex... A custom pixel shader clips pixels that are outside their pre-defined viewports". This sounds similar to the workaround that I've described above, but they were using Direct3D, so it's not easy to compare what they were able to achieve with that in 2011 to what I can achieve today in OpenGL.
I realise that the only definitive answer to my question is to implement the workaround and measure its performance, but it's currently a low-priority curiosity, and I haven't found answers anywhere else, so I hoped that a more experienced GLSL user might be able to offer their time-saving wisdom.