4

I created a custom CIKernel in Metal. This is useful because it is close to real-time. I am avoiding any cgcontext or cicontext that might lag in real time. My kernel essentially does a Hough transform, but I can't seem to figure out how to read the white points from the image buffer.

Here is kernel.metal:

#include <CoreImage/CoreImage.h>

extern "C" {
    namespace coreimage {

        float4 hough(sampler src) {

            // Math

            // More Math

            // eventually:

            if (luminance > 0.8) {
                uint2 position = src.coord()
                // Somehow add this to an array because I need to know the x,y pair
            }

            return float4(luminance, luminance, luminance, 1.0);
        }
    }
}

I am fine if this part can be extracted to a different kernel or function. The caveat to CIKernel, is its return type is a float4 representing the new color of a pixel. Ideally, instead of a image -> image filter, I would like an image -> array sort of deal. E.g. reduce instead of map. I have a bad hunch this will require me to render it and deal with it on the CPU.

Ultimately I want to retrieve the qualifying coordinates (which there can be multiple per image) back in my swift function.

FINAL SOLUTION EDIT:

As per suggestions of the answer, I am doing large per-pixel calculations on the GPU, and some math on the CPU. I designed 2 additional kernels that work like the builtin reduction kernels. One kernel returns a 1 pixel high image of the highest values in each column, and the other kernel returns a 1 pixel high image of the normalized y-coordinate of the highest value:

    /// Returns the maximum value in each column.
    ///
    /// - Parameter src: a sampler for the input texture
    /// - Returns: maximum value in for column
    float4 maxValueForColumn(sampler src) {

        const float2 size = float2(src.extent().z, src.extent().w);

        /// Destination pixel coordinate, normalized
        const float2 pos = src.coord();

        float maxV = 0;

        for (float y = 0; y < size.y; y++) {
            float v = src.sample(float2(pos.x, y / size.y)).x;
            if (v > maxV) {
                maxV = v;
            }
        }

        return float4(maxV, maxV, maxV, 1.0);
    }

    /// Returns the normalized coordinate of the maximum value in each column.
    ///
    /// - Parameter src: a sampler for the input texture
    /// - Returns: normalized y-coordinate of the maximum value in for column
    float4 maxCoordForColumn(sampler src) {

        const float2 size = float2(src.extent().z, src.extent().w);

        /// Destination pixel coordinate, normalized
        const float2 pos = src.coord();

        float maxV = 0;
        float maxY = 0;

        for (float y = 0; y < size.y; y++) {
            float v = src.sample(float2(pos.x, y / size.y)).x;
            if (v > maxV) {
                maxY = y / size.y;
                maxV = v;
            }
        }

        return float4(maxY, maxY, maxY, 1.0);
    }

This won't give every pixel where luminance is greater than 0.8, but for my purposes, it returns enough: the highest value in each column, and its location.

Pro: copying only (2 * image width) bytes over to the CPU instead of every pixel saves TONS of time (a few ms).

Con: If you have two major white points in the same column, you will never know. You might have to alter this and do calculations by row instead of column if that fits your use-case.

FOLLOW UP:

There seems to be a problem in rendering the outputs. The Float values returned in metal are not correlated to the UInt8 values I am getting in swift.

This unanswered question describes the problem.

Edit: This answered question provides a very convenient metal function. When you call it on a metal value (e.g. 0.5) and return it, you will get the correct value (e.g. 128) on the CPU.

Michael
  • 1,115
  • 9
  • 24

1 Answers1

1

Check out the filters in the CICategoryReduction (like CIAreaAverage). They return images that are just a few pixels tall, containing the reduction result. But you still have to render them to be able to read the values in your Swift function.

The problem for using this approach for your problem is that you don't know the number of coordinates you are returning beforehand. Core Image needs to know the extend of the output when it calls your kernel, though. You could just assume a static maximum number of coordinates, but that all sounds tedious.

I think you are better off using Accelerate APIs for iterating the pixels of your image (parallelized, super efficiently) on the CPU to find the corresponding coordinates.

You could do a hybrid approach where you do the per-pixel heavy math on the GPU with Core Image and then do the analysis on the CPU using Accelerate. You can even integrate the CPU part into your Core Image pipeline using a CIImageProcessorKernel.

Frank Rupprecht
  • 9,191
  • 31
  • 56
  • Thanks! I have tried both Accelerate and CIImageProcessorKernel, and this involved copying pixels into memory, which dropped my camera feed to about 11fps (I am ambitiously aiming for at least 30). I am curious what the underlying kernels for CICategoryReduction look like. Not sure if builtins will fit my need, but I might be able to make one. – Michael Jun 13 '19 at 16:35
  • There's no need to copy data since GPU and CPU memory is shared in iOS. From the `CIImageProcessorInput` of the `CIImageProcessorKernel` you can get the `baseAddress` of the underlying data buffer of the input image and operate on that directly using Accelerate. – Frank Rupprecht Jun 14 '19 at 07:38
  • can you explain a little more about how to read the bytes in CPU? I was using `.getBytes()` and it slowed it down quite a bit. In the docs, "Core Image will concatenate filters in a network into as fewer kernels as possible, avoiding the creation of intermediate buffers. However, it is unable to do this with image processor kernels." I think this could also be a problem – Michael Jun 14 '19 at 16:16
  • Thinking about it I guess you can't really use a `CIImageProcessorKernel` either since you don't know the number of results. Somehow you _need_ to render your final `CIImage` into some data buffer that you have access to with Accelerate (for instance using `CIContext.render(CIImage, toBitmap: UnsafeMutableRawPointer, rowBytes: Int, bounds: CGRect, format: CIFormat, colorSpace: CGColorSpace?)`). Can you maybe say where your images initially come from and where you need the result to go (i.e. if it's needed further down the pipeline)? – Frank Rupprecht Jun 16 '19 at 11:29
  • do you have any insight into [this question](https://stackoverflow.com/questions/53619209/metal-custom-cifilter-different-return-value) about srgba to linear because I am reading different values than my kernel returns – Michael Jun 21 '19 at 15:47
  • @MichaelAustin Yes, it's most likely due to automatic color matching. See my answer on the linked question. – Frank Rupprecht Jun 22 '19 at 16:41