2

I want to execute Metal (or OpenGLES 3.0) shader that draws Points primitive with blending. To do that, I need to pass all the pixel coordinates of the texture to Vertex shader as vertices which computes the position of the vertex to be passed to fragment shader. The fragment shader simply outputs the color for the point with blending enabled. My problem is if there is an efficient was to pass coordinates of vertices to the vertex shader, since there would be too many vertices for 1920x1080 image, and that needs to be done 30 times in a second? Like we do in a compute shader by using dispatchThreadgroups command, except that compute shader can not draw a geometry with blending enabled.

EDIT: This is what I did -

  let vertexFunctionRed = library!.makeFunction(name: "vertexShaderHistogramBlenderRed")

    let fragmentFunctionAccumulator = library!.makeFunction(name: "fragmentShaderHistogramAccumulator")


    let renderPipelineDescriptorRed = MTLRenderPipelineDescriptor()
    renderPipelineDescriptorRed.vertexFunction = vertexFunctionRed
    renderPipelineDescriptorRed.fragmentFunction = fragmentFunctionAccumulator
    renderPipelineDescriptorRed.colorAttachments[0].pixelFormat = .bgra8Unorm
    renderPipelineDescriptorRed.colorAttachments[0].isBlendingEnabled = true
    renderPipelineDescriptorRed.colorAttachments[0].rgbBlendOperation = .add
    renderPipelineDescriptorRed.colorAttachments[0].sourceRGBBlendFactor = .one
    renderPipelineDescriptorRed.colorAttachments[0].destinationRGBBlendFactor = .one

    do {
        histogramPipelineRed = try device.makeRenderPipelineState(descriptor: renderPipelineDescriptorRed)
    } catch {
        print("Unable to compile render pipeline state Histogram Red!")
        return
    }

Drawing code:

  let commandBuffer = commandQueue?.makeCommandBuffer()
        let renderEncoder = commandBuffer?.makeRenderCommandEncoder(descriptor: renderPassDescriptor!)
        renderEncoder?.setRenderPipelineState(histogramPipelineRed!)
        renderEncoder?.setVertexTexture(metalTexture, index: 0)
        renderEncoder?.drawPrimitives(type: .point, vertexStart: 0, vertexCount: 1, instanceCount: metalTexture!.width*metalTexture!.height)
  renderEncoder?.drawPrimitives(type: .point, vertexStart: 0, vertexCount: metalTexture!.width*metalTexture!.height, instanceCount: 1)

and Shaders:

  vertex MappedVertex vertexShaderHistogramBlenderRed (texture2d<float, access::sample> inputTexture [[ texture(0) ]],
                                                 unsigned int vertexId [[vertex_id]])
  {
        MappedVertex out;

constexpr sampler s(s_address::clamp_to_edge, t_address::clamp_to_edge, min_filter::linear, mag_filter::linear, coord::pixel);

ushort width = inputTexture.get_width();
ushort height = inputTexture.get_height();

float X = (vertexId % width)/(1.0*width);
float Y = (vertexId/width)/(1.0*height);

 int red = inputTexture.sample(s, float2(X,Y)).r;

 out.position = float4(-1.0 + (red * 0.0078125), 0.0, 0.0, 1.0);
 out.pointSize = 1.0;
 out.colorFactor = half3(1.0, 0.0, 0.0);

 return out;
 }

   fragment half4 fragmentShaderHistogramAccumulator ( MappedVertex in [[ stage_in ]]
                                              )
 {
    half3 colorFactor = in.colorFactor;
    return half4(colorFactor*(1.0/256.0), 1.0); 
}
Deepak Sharma
  • 5,577
  • 7
  • 55
  • 131
  • Huh? You're trying to use point primitives for every pixel in the render target? Why are you using (or trying to use) point primitives? It sounds like a task for which you should just draw a quad. – Ken Thomases May 24 '18 at 14:01
  • Uff, I tried various approaches to compute image statistics such as Histogram. Tried MPSImageHistogram, custom Metal compute shader that uses atomic_uint to increment statistics, both take 25 milliseconds per frame. May be it is atomic operation that sucks, so trying another approach that simply maps each pixel of texture to a position (0 to 255) and fragment shader simply writes a color to that point with additive blending enabled. This seems very slow at the moment, not sure why. – Deepak Sharma May 24 '18 at 19:44
  • 1
    When using any of the techniques you've tried, how did you determine that it was the shader that was taking the time? Are you sure you're not stalling the pipeline? Have you applied the general [MPS tuning hints](https://developer.apple.com/documentation/metalperformanceshaders?language=objc#1674374)? – Ken Thomases May 25 '18 at 01:57
  • That's a good point, but how do I know if pipeline is getting stalled or GPU is busy processing? Please do point me tools that can nail down the issue. And yes, MPS tuning hints are good. Thanks for pointing out. – Deepak Sharma May 25 '18 at 10:56
  • Check [here](https://developer.apple.com/library/content/documentation/DeveloperTools/Conceptual/debugging_with_xcode/chapters/special_debugging_workflows.html#//apple_ref/doc/uid/TP40015022-CH9-SW24) and [here](https://developer.apple.com/library/content/documentation/3DDrawing/Conceptual/MTLBestPracticesGuide/index.html). – Ken Thomases May 25 '18 at 14:40

1 Answers1

1

Maybe you can draw a single point instanced 1920x1080 times. Something like:

vertex float4 my_func(texture2d<float, access::read> image [[texture(0)]],
                      constant uint &width [[buffer(0)]],
                      uint instance_id [[instance_id]])
{
    // decompose the instance ID to a position
    uint2 pos = uint2(instance_id % width, instance_id / width);
    return float4(image.read(pos).r * 255, 0, 0, 0);
}
Ken Thomases
  • 88,520
  • 7
  • 116
  • 154
  • I did similar thing but with [vertex_id] instead of [instance_id] (updated my answer). What exactly is the difference? The shader is too slow btw. – Deepak Sharma May 24 '18 at 21:36
  • Really not much difference. – Ken Thomases May 24 '18 at 21:38
  • Is there anyway to make shader faster, I thought blending would be faster than atomics but it is awfully slow. – Deepak Sharma May 24 '18 at 21:43
  • Hard to know. Xcode's GPU debugger can give some insight. Do you need sampling? Reading may be slightly faster. Anyway, in theory, `MPSImageHistogram` should be as fast as Apple could figure out how to do this. – Ken Thomases May 24 '18 at 21:48
  • No I do not need sampling. Infact I switched from read to sample to see if that improves performance. – Deepak Sharma May 25 '18 at 10:52
  • The problem with MPSImageHistogram is it gives too accurate statistics at the expense of performance. I do not need that much accuracy. For me, saturating the output upto 10 bits is okay for graphical display, which I thought could be achieved by repeated blending. But it is worse. I will post the code just in case I am doing anything wrong in blending. – Deepak Sharma May 25 '18 at 10:54
  • Can you look at this question please - https://stackoverflow.com/questions/57348166/metal-rgb-to-yuv-conversion-compute-shader – Deepak Sharma Aug 04 '19 at 16:40