Metal emulate geometry shaders using compute shaders

Question

I'm trying to implement voxel cone tracing in Metal. One of the steps in the algorithm is to voxelize the geometry using a geometry shader. Metal does not have geometry shaders so I was looking into emulating them using a compute shader. I pass in my vertex buffer into the compute shader, do what a geometry shader would normally do, and write the result to an output buffer. I also add a draw command to an indirect buffer. I use the output buffer as the vertex buffer for my vertex shader. This works fine, but I need twice as much memory for my vertices, one for the vertex buffer and one for the output buffer. Is there any way to directly pass the output of the compute shader to the vertex shader without storing it in an intermediate buffer? I don't need to save the contents of the output buffer of the compute shader. I just need to give the results to the vertex shader.

Is this possible? Thanks

EDIT

Essentially, I'm trying to emulate the following shader from glsl:

#version 450

layout(triangles) in;
layout(triangle_strip, max_vertices = 3) out;

layout(location = 0) in vec3 in_position[];
layout(location = 1) in vec3 in_normal[];
layout(location = 2) in vec2 in_uv[];

layout(location = 0) out vec3 out_position;
layout(location = 1) out vec3 out_normal;
layout(location = 2) out vec2 out_uv;

void main()
{
    vec3 p = abs(cross(in_position[1] - in_position[0], in_position[2] - in_position[0]));

    for (uint i = 0; i < 3; ++i)
    {
        out_position = in_position[i];
        out_normal = in_normal[i];
        out_uv = in_uv[i];

        if (p.z > p.x && p.z > p.y)
        {
            gl_Position = vec4(out_position.x, out_position.y, 0, 1);
        }
        else if (p.x > p.y && p.x > p.z)
        {
            gl_Position = vec4(out_position.y, out_position.z, 0, 1);
        }
        else
        {
            gl_Position = vec4(out_position.x, out_position.z, 0, 1);
        }

        EmitVertex();
    }

    EndPrimitive();
}

For each triangle, I need to output a triangle with vertices at these new positions instead. The triangle vertices come from a vertex buffer and is drawn using an index buffer. I also plan on adding code that will do conservative rasterization (just increase the size of the triangle by a little bit) but it's not shown here. Currently what I'm doing in the Metal compute shader is using the index buffer to get the vertex, do the same code in the geometry shader above, and outputting the new vertex in another buffer which I then use to draw.

Even if there were such a feature built into Metal, it would probably just use a buffer internally. If you create your output buffer with private storage mode, it will live entirely on the GPU and never be transferred. It will probably be pretty close in terms of resource usage to what Metal would do internally. — Ken Thomases, May 27 '18 at 23:32
I see. It's just that indexed drawing is also pretty much pointless using this technique because I would end up duplicating vertices anyways. Is there a better way of doing this when using indices? — theonewhoknocks, May 28 '18 at 00:40
It depends on what your geometry shader is doing. Can you share anything about it? Code would be great, but also other stuff like: is the number of output primitives easily calculated from the number of input primitives? Do you have to handle more than one type of input primitive type, like triangle list vs. triangle strip? — Ken Thomases, May 28 '18 at 01:35

Ken Thomases · Accepted Answer · 2018-05-28T04:20:57.170

Here's a very speculative possibility depending on exactly what your geometry shader needs to do.

I'm thinking you can do it sort of "backwards" with just a vertex shader and no separate compute shader, at the cost of redundant work on the GPU. You would do a draw as if you had a buffer of all of the output vertices of the output primitives of the geometry shader. You would not actually have that on hand, though. You would construct a vertex shader that would calculate them in flight.

So, in the app code, calculate the number of output primitives and therefore the number of output vertices that would be produced for a given count of input primitives. Do a draw of the output primitive type with that many vertices.

You would not provide a buffer with the output vertex data as input to this draw.

You would provide the original index buffer and original vertex buffer as inputs to the vertex shader for that draw. The shader would calculate from the vertex ID which output primitive it's for, and which vertex of that primitive (e.g. for a triangle, vid / 3 and vid % 3, respectively). From the output primitive ID, it would calculate which input primitive would have generated it in the original geometry shader.

The shader would look up the indices for that input primitive from the index buffer and then the vertex data from the vertex buffer. (This would be sensitive to the distinction between a triangle list vs. triangle strip, for example.) It would apply any pre-geometry-shader vertex shading to that data. Then it would do the part of the geometry computation that contributes to the identified vertex of the identified output primitive. Once it has calculated the output vertex data, you can apply any post-geometry-shader vertex shading(?) that you want. The result is what it would return.

If the geometry shader can produce a variable number of output primitives per input primitive, well, at least you have a maximum number. So, you can draw the maximum potential count of vertices for the maximum potential count of output primitives. The vertex shader can do the computations necessary to figure out if the geometry shader would have, in fact, produced that primitive. If not, the vertex shader can arrange for the whole primitive to be clipped away, either by positioning it outside of the frustum or using a [[clip_distance]] property of the output vertex data.

This avoids ever storing the generated primitives in a buffer. However, it causes the GPU to do some of the pre-geometry-shader vertex shader and geometry shader calculations repeatedly. It will be parallelized, of course, but may still be slower than what you're doing now. Also, it may defeat some optimizations around fetching indices and vertex data that may be possible with more normal vertex shaders.

Here's an example conversion of your geometry shader:

#include <metal_stdlib>
using namespace metal;

struct VertexIn {
    // maybe need packed types here depending on your vertex buffer layout
    // can't use [[attribute(n)]] for these because Metal isn't doing the vertex lookup for us
    float3 position;
    float3 normal;
    float2 uv;
};

struct VertexOut {
    float3 position;
    float3 normal;
    float2 uv;
    float4 new_position [[position]];
};


vertex VertexOut foo(uint vid [[vertex_id]],
                     device const uint *indexes [[buffer(0)]],
                     device const VertexIn *vertexes [[buffer(1)]])
{
    VertexOut out;

    const uint triangle_id = vid / 3;
    const uint vertex_of_triangle = vid % 3;

    // indexes is for a triangle strip even though this shader is invoked for a triangle list.
    const uint index[3] = { indexes[triangle_id], index[triangle_id + 1], index[triangle_id + 2] };
    const VertexIn v[3] = { vertexes[index[0]], vertexes[index[1]], vertexes[index[2]] };

    float3 p = abs(cross(v[1].position - v[0].position, v[2].position - v[0].position));

    out.position = v[vertex_of_triangle].position;
    out.normal = v[vertex_of_triangle].normal;
    out.uv = v[vertex_of_triangle].uv;

    if (p.z > p.x && p.z > p.y)
    {
        out.new_position = float4(out.position.x, out.position.y, 0, 1);
    }
    else if (p.x > p.y && p.x > p.z)
    {
        out.new_position = float4(out.position.y, out.position.z, 0, 1);
    }
    else
    {
        out.new_position = float4(out.position.x, out.position.z, 0, 1);
    }

    return out;
}

Thanks for your help. I've updated the question to include the geometry shader I am trying to emulate. My output primitive is triangle strip and I basically just need the other 2 vertices of the triangle I am currently "working on" to calculate the new position. So I would use the vertex id to access the vertex/index buffer in order to get the other 2 vertices of the triangle? — theonewhoknocks, May 28 '18 at 02:50
Well, the geometry shader just produces one triangle per input triangle. That's nicely straightforward. However, it's not clear to me that two input triangles which share a vertex (or two) will produce output triangles that also share vertexes. So, I think you'll have to run the vertex shader I propose against a virtual triangle list, not a triangle strip. That is, you'll have to make sure it's invoked N*3 times, not just N+2 times (where N is the number of triangles; so if V is the original vertex count of a triangle strip, then you'd want (V-2)*3). — Ken Thomases, May 28 '18 at 03:47
I think for my specific case the number of primitives will equal the number of indices / 3. In which case my draw call would be the same (drawIndexedPrimitives), I would just need to also pass the index buffer to the vertex shader and use that to access my vertex buffer in order to get the other 2 vertices of the triangle. This technique seems to be reasonable. I'll give it a shot. — theonewhoknocks, May 28 '18 at 04:25
You won't be able to use `drawIndexedPrimitives`. The vertex ID passed to your shader won't suffice to calculate the triangle ID and thus the indices of the other two vertices. It's the value after looking up the index from the index buffer, not the element number of that index. For example, if your index buffer contains 1,2,3,1,4,5, that's two triangles that share a vertex. Now, your shader is invoked with vertex ID of 1. Which triangle is it for? — Ken Thomases, May 28 '18 at 05:35
I see. Is there an easy way to get the triangle ID? Something similar to gl_PrimitiveID from glsl? — theonewhoknocks, May 28 '18 at 05:48
Oh, never mind. I should be able to just use drawPrimitives with the vertex count as the index count. — theonewhoknocks, May 28 '18 at 06:05
Yup. Or, for the general case, you would use the count of desired output primitives times the number of vertices per primitive. — Ken Thomases, May 28 '18 at 14:17

score 0 · Answer 2 · answered Jan 15 '21 at 09:35

0

Unfortunately there is no way to do this (and other things) in Metal, without going into unneeded complications. The API lacks critical features that are common in Vulkan, OpenGL and DirectX...

answered Jan 15 '21 at 09:35

Dixie Pal

1
2

Metal emulate geometry shaders using compute shaders

2 Answers2

Linked