0

I am trying to train MLModel with image classification. I created an app to create images to use as training data(at the end the same process will be used to get predictions). I get CVPixelBuffer from AvCaptureSession, convert it to UIImage and save it to documents directory as JPEG. Later I label them and train MLModel with CreateML in the playground. Results are %100 in the playground since I have collected thousands of images.

But when i integrate this model in my app and feed it with the same way, results are awful. I get CVPixelBuffer, convert it UIImage(to crop) and convert the cropped image to CVPixelBuffer and give it to model. I have to convert UIImage to CVPixelBuffer because CoreML model only excepts CVPixelBuffer. I convert UIImage to CVPixelBuffer with this method:

func pixelBuffer(width: Int, height: Int) -> CVPixelBuffer? {
    var maybePixelBuffer: CVPixelBuffer?
    let attrs = [kCVPixelBufferCGImageCompatibilityKey: kCFBooleanTrue,
                 kCVPixelBufferCGBitmapContextCompatibilityKey: kCFBooleanTrue]
    let status = CVPixelBufferCreate(kCFAllocatorDefault,
                                     width,
                                     height,
                                     kCVPixelFormatType_32ARGB,
                                     attrs as CFDictionary,
                                     &maybePixelBuffer)

    guard status == kCVReturnSuccess, let pixelBuffer = maybePixelBuffer else {
        return nil
    }

    CVPixelBufferLockBaseAddress(pixelBuffer, CVPixelBufferLockFlags(rawValue: 0))
    let pixelData = CVPixelBufferGetBaseAddress(pixelBuffer)

    guard let context = CGContext(data: pixelData,
                                  width: width,
                                  height: height,
                                  bitsPerComponent: 8,
                                  bytesPerRow: CVPixelBufferGetBytesPerRow(pixelBuffer),
                                  space: CGColorSpaceCreateDeviceRGB(),
                                  bitmapInfo: CGImageAlphaInfo.noneSkipFirst.rawValue)
        else {
            return nil
    }

    UIGraphicsPushContext(context)
    context.translateBy(x: 0, y: CGFloat(height))
    context.scaleBy(x: 1, y: -1)
    self.draw(in: CGRect(x: 0, y: 0, width: width, height: height))
    UIGraphicsPopContext()

    CVPixelBufferUnlockBaseAddress(pixelBuffer, CVPixelBufferLockFlags(rawValue: 0))
    return pixelBuffer
}

I think I am getting poor results because CoreML model doesn't like the converted CVPixelBuffer.

Does anyone have any suggestion?

ysnzlcn
  • 558
  • 6
  • 18

1 Answers1

0

You don't need any of this stuff. Let's have a look at the docs:

class VNCoreMLRequest : VNImageBasedRequest

Firstly, VNImageBasedRequest contains field regionOfInterest: CGRect { get set } where rectangle is normalized and lower-left relative. So you do not need to crop! Simply specify ROI.

Secondly, VNCoreMLRequest itself has field var imageCropAndScaleOption: VNImageCropAndScaleOption { get set } where you can specify how to act when height/width ratio does not match expected one (center crop, scale to fit/fill).

Maxim Volgin
  • 3,957
  • 1
  • 23
  • 38
  • Thanks for answering Maxim, I am cropping image because I don't have only one region of interest. It can go up to 50. So, I crop the image into (up to) 50 pieces and give it to ML Model, and model returns results in reasonable time. But like I mentioned, results are very inaccurate. – ysnzlcn Feb 24 '19 at 16:54
  • 1
    So? You can feed it 50 image requests with different ROI. Although I don't understand your use case. If you need multiple objects, you need an object detector model (i.e. NOT an image classifier). – Maxim Volgin Feb 24 '19 at 16:57
  • Well, I detect objects with Vision, then crop them out of the main image. Then feed them to MLModel. Detecting objects is not a problem, the problem is MLModel is behaving differently than the playground. I feed it with the same data, the only difference is; i convert them to UIImage then CVPixelBuffer again. – ysnzlcn Feb 24 '19 at 17:08
  • Perhaps colors are off? Camera typically provides BGRA (nor RGBA). But then again, cropping can be done on CVPixelBuffer without conversion - https://github.com/hollance/CoreMLHelpers/blob/master/CoreMLHelpers/CVPixelBuffer%2BHelpers.swift – Maxim Volgin Feb 24 '19 at 17:14
  • I have tried all RGBA/BGRA options, the result is the same. But I will try to crop pixel buffer, and feed it to MLModel, as you said. I will let you know the results. – ysnzlcn Feb 24 '19 at 17:20
  • I cropped the buffer and fed it to ml model but results gotten worse. I have no idea what to do from this point on. Thanks for the suggestions btw. – ysnzlcn Feb 24 '19 at 17:50
  • If there is a chance that timing is the problem (i.e. you are not sure whether it is still the same frame that you are cropping), perhaps you can try https://github.com/maxvol/RxVision – Maxim Volgin Feb 24 '19 at 17:55
  • I have checked all images before give it to MLModel and they all are like I expected. Maybe if I tell you what I am trying to accomplish you may get the idea. I am trying to create OCR Model from the image classification model. Since I have limited input, performance won't be a problem. Think about the licence plate, I crop the letters and label them(thousands of them). Then I trained my model, and results are great in the playground. But in-app it's really inaccurate. – ysnzlcn Feb 24 '19 at 18:03
  • Character rectangles are fed to classifier one by one? Ok. If settings for requests are exactly the same (like cropping options) and images are visually the same, perhaps you can analyze other settings of the CVPixelBuffer in question. There can be slight differences in defaults between macOS (=playground) and iOS. So I would log every single setting of these buffers and compare. – Maxim Volgin Feb 24 '19 at 18:09
  • Yeap, since there are only 50 possibilities and the font, is always the same, I thought training ml model will give better results than any other OCR engine. Okay, I will try to log every step of converting uiimage to pixel buffer so, i can create the same data as the training data. I will try and let you know, so you can change your original answer, and I can mark it as the accepted answer. – ysnzlcn Feb 24 '19 at 18:14
  • I have tried every possible combination of settings. And the best one was the first one I began with. Yet, it is still producing unreliable results. It might be causing because I send images in different frames. I will train new data with a fixed size and will try to feed it with fixed size images and check the results. Thanks for the suggestions, anyway. – ysnzlcn Feb 26 '19 at 07:28
  • I see! Yeah if training set contains images of different size, you need to figure out which cropping/scaling options are being used during training and apply the same ones when using the model. – Maxim Volgin Feb 26 '19 at 08:23