3

I’m trying to develop an augmented reality app with ARkit and OpenCV using swift and Objective-C++ on ios. I need to process in real time ARkit video frames with opencv (in order to replace the video with my transformed one).

To do that, I use frame.capturedImage which is a CVPixelBuffer, that I convert to a cv::mat. Then, I make my opencv stuff, and convert the result in UIImage (code below).

It works but it’s very very slow (less than 2 FPS). I’m quite new in IOS programming but I think that using UIImage is not the proper way…

Do you have an idea? What can I do to optimize my app? Thank you in advance...

Here is my code:

Conversion from buffer to cv::mat :

+ (Mat)_matFromBuffer:(CVImageBufferRef)buffer {

    Mat mat;
    CVPixelBufferLockBaseAddress(buffer,0);
    //Get the data from the first plane (Y)
    void *address = CVPixelBufferGetBaseAddressOfPlane(buffer, 0);
    int bufferWidth = (int)CVPixelBufferGetWidthOfPlane(buffer,0);
    int bufferHeight = (int)CVPixelBufferGetHeightOfPlane(buffer, 0);
    int bytePerRow = (int)CVPixelBufferGetBytesPerRowOfPlane(buffer, 0);
    //Get the pixel format
    OSType pixelFormat = CVPixelBufferGetPixelFormatType(buffer);

    Mat converted;
    //NOTE: CV_8UC3 means unsigned (0-255) 8 bits per pixel, with 3 channels!
    //Check to see if this is the correct pixel format
    if (pixelFormat == kCVPixelFormatType_420YpCbCr8BiPlanarFullRange) {
        //We have an ARKIT buffer
        //Get the yPlane (Luma values)
        Mat yPlane = Mat(bufferHeight, bufferWidth, CV_8UC1, address);
        //Get cbcrPlane (Chroma values)
        int cbcrWidth = (int)CVPixelBufferGetWidthOfPlane(buffer,1);
        int cbcrHeight = (int)CVPixelBufferGetHeightOfPlane(buffer, 1);
        void *cbcrAddress = CVPixelBufferGetBaseAddressOfPlane(buffer, 1);
        //Since the CbCr Values are alternating we have 2 channels: Cb and Cr. Thus we need to use CV_8UC2 here.
        Mat cbcrPlane = Mat(cbcrHeight, cbcrWidth, CV_8UC2, cbcrAddress);
        //Split them apart so we can merge them with the luma values
        vector<Mat> cbcrPlanes;
        split(cbcrPlane, cbcrPlanes);

        Mat cbPlane;
        Mat crPlane;
        //Since we have a 4:2:0 format, cb and cr values are only present for each 2x2 luma pixels. Thus we need to enlargen them (by a factor of 2).
        resize(cbcrPlanes[0], cbPlane, yPlane.size(), 0, 0, INTER_NEAREST);
        resize(cbcrPlanes[1], crPlane, yPlane.size(), 0, 0, INTER_NEAREST);

        Mat ycbcr;
        vector<Mat> allPlanes = {yPlane, cbPlane, crPlane};
        merge(allPlanes, ycbcr);
        //ycbcr now contains all three planes. We need to convert it from YCbCr to RGB so OpenCV can work with it
        cvtColor(ycbcr, converted, COLOR_YCrCb2RGB);
    } else {
        //Probably RGB so just use that.
        converted = Mat(bufferHeight, bufferWidth, CV_8UC3, address, bytePerRow).clone();
    }

    //Since we clone the cv::Mat no need to keep the Buffer Locked while we work on it.
    CVPixelBufferUnlockBaseAddress(buffer, 0);

    Mat rotated;
    transpose(converted, rotated);
    flip(rotated,rotated, 1);

    return rotated;
}

conversion from mat to UIImage:

+ (UIImage *)_imageFrom:(Mat)source {
    cout << "-> imageFrom\n";

    NSData *data = [NSData dataWithBytes:source.data length:source.elemSize() * source.total()];
    CGDataProviderRef provider = CGDataProviderCreateWithCFData((__bridge CFDataRef)data);

    CGBitmapInfo bitmapFlags = kCGImageAlphaNone | kCGBitmapByteOrderDefault;
    size_t bitsPerComponent = 8;
    size_t bytesPerRow = source.step[0];
    CGColorSpaceRef colorSpace = (source.elemSize() == 1 ? CGColorSpaceCreateDeviceGray() : CGColorSpaceCreateDeviceRGB());

    CGImageRef image = CGImageCreate(source.cols, source.rows, bitsPerComponent, bitsPerComponent * source.elemSize(), bytesPerRow, colorSpace, bitmapFlags, provider, NULL, false, kCGRenderingIntentDefault);
    UIImage *result = [UIImage imageWithCGImage:image];

    CGImageRelease(image);
    CGDataProviderRelease(provider);
    CGColorSpaceRelease(colorSpace);

    return result;
}

and in ViewController.swift

func session(_ session: ARSession, didUpdate frame: ARFrame) {
        let openCVWrapper = OpenCVWrapper()
        openCVWrapper.isThisWorking()

        // Present an error message to the user
        let pixelBuffer = frame.capturedImage
        if (CVPixelBufferGetPlaneCount(pixelBuffer) < 2) {
            return
        }

        // works but too slow (2 FPS)
        let bckgd = OpenCVWrapper.opencvstuff(pixelBuffer) as UIImage!;
        sceneView.scene.background.contents = bckgd;  
    }
V.A
  • 61
  • 5
  • It also depends on what you are doing with opencv. I have a program to detect ball using opencv and track in arkit. It works on around 25 fps for single ball but fps droped upto 5 if there are lots of balls. I would also like to see if there is optimized way – Alok Subedi Jan 10 '18 at 04:38
  • 1
    The OpenCV part is quite light for now. It is mainly used for PCA analysis and colorimetric changes. I made the same application exclusively in C++ and it had absolutely no lag. It makes me think that the problem comes rather from multiple conversions made for each frame ... – V.A Jan 10 '18 at 10:13
  • did you find a solution? Would you mind sharing it as answer? Im having quite the same issue. Thanks. – Alberto Dallaporta Nov 03 '18 at 00:15
  • No, unfortunately... As there was no answer, I worked something else while waiting to find other ideas. – V.A Nov 05 '18 at 12:15

1 Answers1

0

I found a solution with Objective-C and I think you may want to convert it to swift by yourself.

Draw a circle in capturedImage iOS - ARKit Ycbcr

ARFrame. capturedImage is Ycbcr buffer. From jgh's answer How seperate y-planar, u-planar and uv-planar from yuv bi planar in ios?

The Y plane represents the luminance component, and the UV plane represents the Cb and Cr chroma components.

In the case of kCVPixelFormatType_420YpCbCr8BiPlanarFullRange format, you will find the luma plane is 8bpp with the same dimensions as your video, your chroma plane will be 16bpp, but only a quarter of the size of the original video. You will have one Cb and one Cr component per pixel on this plane.

so if your input video is 352x288, your Y plane will be 352x288 8bpp, and your CbCr 176x144 16bpp. This works out to be about the same amount of data as a 12bpp 352x288 image, half what would be required for RGB888 and still less than RGB565.

So in the buffer, Y will look like this [YYYYY . . . ] and UV [UVUVUVUVUV . . .]

vs RGB being, of course, [RGBRGBRGB . . . ]

We should do image processing in cv::Mat with yPlane and cbcrPlane, check my code for more details.

Community
  • 1
  • 1
WatersLake
  • 1,252
  • 2
  • 13
  • 21