Recording of metal view is slow due to texture.getbytes function - Swift

Question

I am using this post for recording a custom metal view, but I am experiencing some issues. When I start recording I go from 60fps to ~20fps on a iPhone 12 Pro Max. After Profiling, the function that is slowing everything is texture.getBytes, as it is grabbing buffer from the GPU into the CPU.

Another issue, not sure if consequence of this, is that the video and audio are out of sync. I am not sure if I should go into the semaphores route for solving this or there is any other potential workaround.

In my case, the texture size is as big as the screen size, as I create it from the camera stream and then process it through a couple of CIFilters. I am not sure if the issue is that it is too big so getBytes cannot support this size of textures on a real-time basis.

If I need to define priorities, my #1 priority would be to solve the out-of-sync between the audio and video. Any thoughts would be super helpful.

Here is the code:

import AVFoundation

class MetalVideoRecorder {
    var isRecording = false
    var recordingStartTime = TimeInterval(0)

    private var assetWriter: AVAssetWriter
    private var assetWriterVideoInput: AVAssetWriterInput
    private var assetWriterPixelBufferInput: AVAssetWriterInputPixelBufferAdaptor

    init?(outputURL url: URL, size: CGSize) {
        do {
          assetWriter = try AVAssetWriter(outputURL: url, fileType: AVFileType.m4v)
        } catch {
            return nil
        }

      let outputSettings: [String: Any] = [ AVVideoCodecKey : AVVideoCodecType.h264,
            AVVideoWidthKey : size.width,
            AVVideoHeightKey : size.height ]

      assetWriterVideoInput = AVAssetWriterInput(mediaType: AVMediaType.video, outputSettings: outputSettings)
        assetWriterVideoInput.expectsMediaDataInRealTime = true

        let sourcePixelBufferAttributes: [String: Any] = [
            kCVPixelBufferPixelFormatTypeKey as String : kCVPixelFormatType_32BGRA,
            kCVPixelBufferWidthKey as String : size.width,
            kCVPixelBufferHeightKey as String : size.height ]

        assetWriterPixelBufferInput = AVAssetWriterInputPixelBufferAdaptor(assetWriterInput: assetWriterVideoInput,
                                                                           sourcePixelBufferAttributes: sourcePixelBufferAttributes)

        assetWriter.add(assetWriterVideoInput)
    }

    func startRecording() {
        assetWriter.startWriting()
      assetWriter.startSession(atSourceTime: CMTime.zero)

        recordingStartTime = CACurrentMediaTime()
        isRecording = true
    }

    func endRecording(_ completionHandler: @escaping () -> ()) {
        isRecording = false

        assetWriterVideoInput.markAsFinished()
        assetWriter.finishWriting(completionHandler: completionHandler)
    }

    func writeFrame(forTexture texture: MTLTexture) {
        if !isRecording {
            return
        }

        while !assetWriterVideoInput.isReadyForMoreMediaData {}

        guard let pixelBufferPool = assetWriterPixelBufferInput.pixelBufferPool else {
            print("Pixel buffer asset writer input did not have a pixel buffer pool available; cannot retrieve frame")
            return
        }

        var maybePixelBuffer: CVPixelBuffer? = nil
        let status  = CVPixelBufferPoolCreatePixelBuffer(nil, pixelBufferPool, &maybePixelBuffer)
        if status != kCVReturnSuccess {
            print("Could not get pixel buffer from asset writer input; dropping frame...")
            return
        }

        guard let pixelBuffer = maybePixelBuffer else { return }

        CVPixelBufferLockBaseAddress(pixelBuffer, [])
        let pixelBufferBytes = CVPixelBufferGetBaseAddress(pixelBuffer)!

        // Use the bytes per row value from the pixel buffer since its stride may be rounded up to be 16-byte aligned
        let bytesPerRow = CVPixelBufferGetBytesPerRow(pixelBuffer)
        let region = MTLRegionMake2D(0, 0, texture.width, texture.height)

        texture.getBytes(pixelBufferBytes, bytesPerRow: bytesPerRow, from: region, mipmapLevel: 0)

        let frameTime = CACurrentMediaTime() - recordingStartTime
        let presentationTime = CMTimeMakeWithSeconds(frameTime, preferredTimescale:   240)
        assetWriterPixelBufferInput.append(pixelBuffer, withPresentationTime: presentationTime)

        CVPixelBufferUnlockBaseAddress(pixelBuffer, [])
    }
}

Without more details about your work flow this is really hard to answer. But in principle you should not "record your metal view" but instead record the (filtered) frames coming from the capturer and _also_ display them in a metal view. This way the frames will never have to leave the GPU. — Frank Rupprecht, Mar 23 '21 at 19:00
I would like to record a video from the processed output (camera stream plus other assets) with audio — jmrueda, Mar 23 '21 at 20:24
@jmrueda Please provide the part of your code responsible for rendering. — Hamid Yusifli, Mar 23 '21 at 20:34
@0xBFE1A8 if you could provide any example or further reading that would be really helpful — jmrueda, Apr 29 '21 at 17:44

score 3 · Answer 1 · answered Mar 23 '21 at 22:45

Unlike OpenGL, Metal doesn't have the concept of a default framebuffer. Instead it uses a technique called Swap Chain. A swap chain is a collection of buffers that are used for displaying frames to the user. Each time an application presents a new frame for display, the first buffer in the swap chain takes the place of the displayed buffer.

When a command queue schedules a command buffer for execution, the drawable tracks all render or write requests on itself in that command buffer. The operating system doesn't present the drawable onscreen until the commands have finished executing. By asking the command buffer to present the drawable, you guarantee that presentation happens after the command queue has scheduled this command buffer. Don’t wait for the command buffer to finish executing before registering the drawable’s presentation.

The layer reuses a drawable only if it isn’t onscreen and there are no strong references to it. They exist within a limited and reusable resource pool, and a drawable may or may not be available when you request one. If none are available, Core Animation blocks your calling thread until a new drawable becomes available — usually at the next display refresh interval.

In your case frame recorder keeps a reference to your drawable for too long which is what causes the frame drops. In order to avoid it you should implement a Triple Buffering Model. Adding a third dynamic data buffer is the ideal solution when considering processor idle time, memory overhead, and frame latency.

score 1 · Answer 2 · answered Apr 23 '21 at 14:32

I have encountered the same problem, I'd like to know if you have solved this problem. Here is what I know now.

Everything is doing on main thread. You can init another serial queue to do the writing & finishWriting asynchronously. My iPhone Xs Max can record screen size video at 60 FPS. You can check this repo，it is swift version of Apple's sample which is using AVAssetWriter, and it will tell you how to sync your video and audio. RosyWriter
getBytes might have performance issue on A14 devices. Same code running on iPhone 12 Pro Max, the output video is laggy and unusable. You can check this. Developer Forums

score 1 · Answer 3 · answered May 08 '23 at 15:52

I found it around 10X faster to copy the texture data to a buffer, and then read the data from the buffer.

Compute shader to copy to buffer. colorBuf represents RGB uint8s, so it has size height * width * 3.

kernel void copyOutputs(
    texture2d<float, access::read> colorTex [[texture(0)]],
    device uint8_t*                colorBuf [[buffer(0)]],
    uint2                          index    [[thread_position_in_grid]]
) {
    float4 color = colorTex.read(index);
    int idx = index.y * WIDTH + index.x;
    int idx3 = idx * 3;
    colorBuf[idx3 + 0] = color.r * 255;
    colorBuf[idx3 + 1] = color.g * 255;
    colorBuf[idx3 + 2] = color.b * 255;
}

I'm using the C++ API, so I do this to copy data from the buffer to the C style array color_out. I'm sure you can do a Swift equivalent.

memcpy(color_out, pBufColorOut->contents(), HEIGHT * WIDTH * 3 * sizeof(uint8_t));

score 0 · Answer 4 · edited Dec 02 '21 at 14:00

I did not fully understand how to implement @HamidYusifli proposed solution, so I focused on:

Optimize the rest of the Metal code (I am doing some real time image processing)
Fix the out of sync video and audio via AVCaptureSynchronizedData

With this new implementation my code is still consuming quite a lot of CPU (106% on iPhone 12 plus) and at ~20fps but with a feeling of working pretty smooth to the user (there is no out-of-sync)

Recording of metal view is slow due to texture.getbytes function - Swift

4 Answers4