2

I have two MLModels in my app. The first one is generating an MLMultiArray output which is meant to be used as the second model input.
As I'm trying to make things as performance-best as possible. I was thinking about using VNImageRequestHandler to feed it with the first model output (MLMultiArray) and use Vision resize and rectOfIntersent to avoid converting the first input to an image, crop features, to avoid the need to convert the first output to image, do everything manually and use the regular image initializer.

Something like that:

   let request = VNCoreMLRequest(model: mlModel) { (request, error) in
        // handle logic?
    }
    
    request.regionOfInterest = // my region

    let handler = VNImageRequestHandler(multiArray: myFirstModelOutputMultiArray)

Or I have to go through back and forth conversions? Trying to reduce processing delays.

Andy Jazz
  • 49,178
  • 17
  • 136
  • 220
Roi Mulia
  • 5,626
  • 11
  • 54
  • 105

1 Answers1

1

Vision uses images (hence the name ;-) ). If you don't want to use images, you need to use the Core ML API directly.

If the output from the first model really is an image, it's easiest to change that model's output type to an image so that you get a CVPixelBuffer instead of an MLMultiArray. Then you can directly pass this CVPixelBuffer into the next model using Vision.

Matthijs Hollemans
  • 7,706
  • 2
  • 16
  • 23
  • That's might be the best solution!! Didn't know I can pass pixel buffer directly :) I'll take a look at CoreML survivor guide :) – Roi Mulia Jun 15 '21 at 10:47