Holding onto a MTLTexture from a CVImageBuffer causes stuttering

Question

I'm creating a MTLTexture from CVImageBuffers (from camera and players) using CVMetalTextureCacheCreateTextureFromImage to get a CVMetalTexture and then CVMetalTextureGetTexture to get the MTLTexture.

The problem I'm seeing is that when I later render the texture using Metal, I occasionally see video frames rendered out of order (visually it stutters back and forth in time), presumably because CoreVideo is modifying the underlying CVImageBuffer storage and the MTLTexture is just pointing there.

Is there any way to make CoreVideo not touch that buffer and use another one from its pool until I release the MTLTexture object?

My current workaround is blitting the texture using a MTLBlitCommandEncoder but since I just need to hold on to the texture for ~30 milliseconds that seems unnecessary.

Do you maintain a strong reference to the `CVMetalTexture` until the point when you're done with the Metal texture? Or are you only holding a strong reference to the `MTLTexture` object? — Ken Thomases, Apr 24 '17 at 23:05
I'm holding a strong reference to the `MTLTexture` only due to some implementation details. Would holding onto the `CVMetalTexture` or `CVImageBuffer` objects solve my problem? — Blixt, Apr 24 '17 at 23:09
I don't know. It might. It's just a guess on my part. If you can easily try, you should. :) — Ken Thomases, Apr 24 '17 at 23:38

score 13 · Answer 1 · answered May 26 '17 at 01:57

13

I recently ran into this exact same issue. The problem is that the MTLTexture is not valid unless it's owning CVMetalTextureRef is still alive. You must keep a reference to the CVMetalTextureRef the entire time you're using the MTLTexture (all the way until the end of the current rendering cycle).

answered May 26 '17 at 01:57

jasongregori

11,451
5
43
41

This is the key to successful texturing from CMSampleBufferRefs under Metal, thanks! – C0C0AL0C0 Aug 02 '17 at 07:16
C0C0AL0C0 I take that this is also the solution to screen tearing in Apple's sample code MetalVideoCapture? https://stackoverflow.com/questions/38879518/screen-tearing-and-camera-capture-with-metal (as you had answered that question I belief but have since removed it) – Gary Aug 10 '17 at 04:32
Keeping an array of CVMetalTextureRef solved the issue but causing memory leak which results in a low memory crash. How to fix this? – arifcse10 Oct 11 '18 at 03:36
Thank you jasongregori! – Manifestor Dec 23 '21 at 20:09

score 3 · Answer 2 · answered Jun 27 '17 at 00:36

I ran into the same problem, but having an extra reference to CVMetalTexture object did NOT solve this problem in my case.

As far as I can tell, it happens only when I receive a new frame from the camera before my metal code completes processing the previous frame.

It seems that CVMetalTextureCacheCreateTextureFromImage simply creates a texture on top of the pixel buffer that the camera is feeding data into it. Therefore, accessing it from Metal code asynchronously is cause some issues.

I have decided to create a copy of MTLTexture (which is also asynchronous but is fast enough).

Here is description of CVMetalTextureCacheCreateTextureFromImage()

"This function creates or returns a cached CoreVideo Metal texture buffer mapped to an image buffer according to the specified, creating a live binding between a device-based image buffer and a MTLTexture object.",

Could u elaborate how you made a copy and fed it back ? – Gizmodo Jun 13 '18 at 21:58 — Gizmodo, Jun 13 '18 at 21:58
Any example/psuedo code on how to achieve this? – JCutting8 Jun 22 '21 at 06:28 — JCutting8, Jun 22 '21 at 06:28

Alessandro Ornano · Answer 3 · 2017-04-25T08:01:24.660

Seems your issue depend by how you manage a session to getting raw camera data.

I think you can analyze the camera session in deep and in real time to know the current status of your session with this class (MetalCameraSession):

import AVFoundation
import Metal
public protocol MetalCameraSessionDelegate {
    func metalCameraSession(_ session: MetalCameraSession, didReceiveFrameAsTextures: [MTLTexture], withTimestamp: Double)
    func metalCameraSession(_ session: MetalCameraSession, didUpdateState: MetalCameraSessionState, error: MetalCameraSessionError?)
}
public final class MetalCameraSession: NSObject {
    public var frameOrientation: AVCaptureVideoOrientation? {
        didSet {
            guard
                let frameOrientation = frameOrientation,
                let outputData = outputData,
                outputData.connection(withMediaType: AVMediaTypeVideo).isVideoOrientationSupported
            else { return }

            outputData.connection(withMediaType: AVMediaTypeVideo).videoOrientation = frameOrientation
        }
    }
    public let captureDevicePosition: AVCaptureDevicePosition
    public var delegate: MetalCameraSessionDelegate?
    public let pixelFormat: MetalCameraPixelFormat
    public init(pixelFormat: MetalCameraPixelFormat = .rgb, captureDevicePosition: AVCaptureDevicePosition = .back, delegate: MetalCameraSessionDelegate? = nil) {
        self.pixelFormat = pixelFormat
        self.captureDevicePosition = captureDevicePosition
        self.delegate = delegate
        super.init();
        NotificationCenter.default.addObserver(self, selector: #selector(captureSessionRuntimeError), name: NSNotification.Name.AVCaptureSessionRuntimeError, object: nil)
    }
    public func start() {
        requestCameraAccess()
        captureSessionQueue.async(execute: {
            do {
                self.captureSession.beginConfiguration()
                try self.initializeInputDevice()
                try self.initializeOutputData()
                self.captureSession.commitConfiguration()
                try self.initializeTextureCache()
                self.captureSession.startRunning()
                self.state = .streaming
            }
            catch let error as MetalCameraSessionError {
                self.handleError(error)
            }
            catch {
                print(error.localizedDescription)
            }
        })
    }
    public func stop() {
        captureSessionQueue.async(execute: {
            self.captureSession.stopRunning()
            self.state = .stopped
        })
    }
    fileprivate var state: MetalCameraSessionState = .waiting {
        didSet {
            guard state != .error else { return }

            delegate?.metalCameraSession(self, didUpdateState: state, error: nil)
        }
    }
    fileprivate var captureSession = AVCaptureSession()
    internal var captureDevice = MetalCameraCaptureDevice()
    fileprivate var captureSessionQueue = DispatchQueue(label: "MetalCameraSessionQueue", attributes: [])
#if arch(i386) || arch(x86_64)
#else
    /// Texture cache we will use for converting frame images to textures
    internal var textureCache: CVMetalTextureCache?
#endif
    fileprivate var metalDevice = MTLCreateSystemDefaultDevice()
    internal var inputDevice: AVCaptureDeviceInput? {
        didSet {
            if let oldValue = oldValue {
                captureSession.removeInput(oldValue)
            }
            captureSession.addInput(inputDevice)
        }
    }
    internal var outputData: AVCaptureVideoDataOutput? {
        didSet {
            if let oldValue = oldValue {
                captureSession.removeOutput(oldValue)
            }
            captureSession.addOutput(outputData)
        }
    }
    fileprivate func requestCameraAccess() {
        captureDevice.requestAccessForMediaType(AVMediaTypeVideo) {
            (granted: Bool) -> Void in
            guard granted else {
                self.handleError(.noHardwareAccess)
                return
            }

            if self.state != .streaming && self.state != .error {
                self.state = .ready
            }
        }
    }
    fileprivate func handleError(_ error: MetalCameraSessionError) {
        if error.isStreamingError() {
            state = .error
        }

        delegate?.metalCameraSession(self, didUpdateState: state, error: error)
    }
    fileprivate func initializeTextureCache() throws {
#if arch(i386) || arch(x86_64)
        throw MetalCameraSessionError.failedToCreateTextureCache
#else
        guard
            let metalDevice = metalDevice,
            CVMetalTextureCacheCreate(kCFAllocatorDefault, nil, metalDevice, nil, &textureCache) == kCVReturnSuccess
        else {
            throw MetalCameraSessionError.failedToCreateTextureCache
        }
#endif
    }
    fileprivate func initializeInputDevice() throws {
        var captureInput: AVCaptureDeviceInput!
        guard let inputDevice = captureDevice.device(mediaType: AVMediaTypeVideo, position: captureDevicePosition) else {
            throw MetalCameraSessionError.requestedHardwareNotFound
        }
        do {
            captureInput = try AVCaptureDeviceInput(device: inputDevice)
        }
        catch {
            throw MetalCameraSessionError.inputDeviceNotAvailable
        }
        guard captureSession.canAddInput(captureInput) else {
            throw MetalCameraSessionError.failedToAddCaptureInputDevice
        }
        self.inputDevice = captureInput
    }
    fileprivate func initializeOutputData() throws {
        let outputData = AVCaptureVideoDataOutput()

        outputData.videoSettings = [
            kCVPixelBufferPixelFormatTypeKey as AnyHashable : Int(pixelFormat.coreVideoType)
        ]
        outputData.alwaysDiscardsLateVideoFrames = true
        outputData.setSampleBufferDelegate(self, queue: captureSessionQueue)

        guard captureSession.canAddOutput(outputData) else {
            throw MetalCameraSessionError.failedToAddCaptureOutput
        }

        self.outputData = outputData
    }
    @objc
    fileprivate func captureSessionRuntimeError() {
        if state == .streaming {
            handleError(.captureSessionRuntimeError)
        }
    }
    deinit {
        NotificationCenter.default.removeObserver(self)
    }
}
extension MetalCameraSession: AVCaptureVideoDataOutputSampleBufferDelegate {
#if arch(i386) || arch(x86_64)
#else
    private func texture(sampleBuffer: CMSampleBuffer?, textureCache: CVMetalTextureCache?, planeIndex: Int = 0, pixelFormat: MTLPixelFormat = .bgra8Unorm) throws -> MTLTexture {
        guard let sampleBuffer = sampleBuffer else {
            throw MetalCameraSessionError.missingSampleBuffer
        }
        guard let textureCache = textureCache else {
            throw MetalCameraSessionError.failedToCreateTextureCache
        }
        guard let imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else {
            throw MetalCameraSessionError.failedToGetImageBuffer
        }
        let isPlanar = CVPixelBufferIsPlanar(imageBuffer)
        let width = isPlanar ? CVPixelBufferGetWidthOfPlane(imageBuffer, planeIndex) : CVPixelBufferGetWidth(imageBuffer)
        let height = isPlanar ? CVPixelBufferGetHeightOfPlane(imageBuffer, planeIndex) : CVPixelBufferGetHeight(imageBuffer)
        var imageTexture: CVMetalTexture?
        let result = CVMetalTextureCacheCreateTextureFromImage(kCFAllocatorDefault, textureCache, imageBuffer, nil, pixelFormat, width, height, planeIndex, &imageTexture)
        guard
            let unwrappedImageTexture = imageTexture,
            let texture = CVMetalTextureGetTexture(unwrappedImageTexture),
            result == kCVReturnSuccess
        else {
            throw MetalCameraSessionError.failedToCreateTextureFromImage
        }
        return texture
    }
    private func timestamp(sampleBuffer: CMSampleBuffer?) throws -> Double {
        guard let sampleBuffer = sampleBuffer else {
            throw MetalCameraSessionError.missingSampleBuffer
        }

        let time = CMSampleBufferGetPresentationTimeStamp(sampleBuffer)

        guard time != kCMTimeInvalid else {
            throw MetalCameraSessionError.failedToRetrieveTimestamp
        }

        return (Double)(time.value) / (Double)(time.timescale);
    }
    @objc public func captureOutput(_ captureOutput: AVCaptureOutput!, didOutputSampleBuffer sampleBuffer: CMSampleBuffer!, from connection: AVCaptureConnection!) {
        do {
            var textures: [MTLTexture]!

            switch pixelFormat {
            case .rgb:
                let textureRGB = try texture(sampleBuffer: sampleBuffer, textureCache: textureCache)
                textures = [textureRGB]
            case .yCbCr:
                let textureY = try texture(sampleBuffer: sampleBuffer, textureCache: textureCache, planeIndex: 0, pixelFormat: .r8Unorm)
                let textureCbCr = try texture(sampleBuffer: sampleBuffer, textureCache: textureCache, planeIndex: 1, pixelFormat: .rg8Unorm)
                textures = [textureY, textureCbCr]
            }

            let timestamp = try self.timestamp(sampleBuffer: sampleBuffer)

            delegate?.metalCameraSession(self, didReceiveFrameAsTextures: textures, withTimestamp: timestamp)
        }
        catch let error as MetalCameraSessionError {
            self.handleError(error)
        }
        catch {
            print(error.localizedDescription)
        }
    }
#endif
}

With this class to know the different session types and the errors that occours (MetalCameraSessionTypes):

import AVFoundation
public enum MetalCameraSessionState {
    case ready
    case streaming
    case stopped
    case waiting
    case error
}
public enum MetalCameraPixelFormat {
    case rgb
    case yCbCr
    var coreVideoType: OSType {
        switch self {
        case .rgb:
            return kCVPixelFormatType_32BGRA
        case .yCbCr:
            return kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange
        }
    }
}
public enum MetalCameraSessionError: Error {
    case noHardwareAccess
    case failedToAddCaptureInputDevice
    case failedToAddCaptureOutput
    case requestedHardwareNotFound
    case inputDeviceNotAvailable
    case captureSessionRuntimeError
    case failedToCreateTextureCache
    case missingSampleBuffer
    case failedToGetImageBuffer
    case failedToCreateTextureFromImage
    case failedToRetrieveTimestamp
    public func isStreamingError() -> Bool {
        switch self {
        case .noHardwareAccess, .failedToAddCaptureInputDevice, .failedToAddCaptureOutput, .requestedHardwareNotFound, .inputDeviceNotAvailable, .captureSessionRuntimeError:
            return true
        default:
            return false
        }
    }
    public var localizedDescription: String {
        switch self {
        case .noHardwareAccess:
            return "Failed to get access to the hardware for a given media type."
        case .failedToAddCaptureInputDevice:
            return "Failed to add a capture input device to the capture session."
        case .failedToAddCaptureOutput:
            return "Failed to add a capture output data channel to the capture session."
        case .requestedHardwareNotFound:
            return "Specified hardware is not available on this device."
        case .inputDeviceNotAvailable:
            return "Capture input device cannot be opened, probably because it is no longer available or because it is in use."
        case .captureSessionRuntimeError:
            return "AVCaptureSession runtime error."
        case .failedToCreateTextureCache:
            return "Failed to initialize texture cache."
        case .missingSampleBuffer:
            return "No sample buffer to convert the image from."
        case .failedToGetImageBuffer:
            return "Failed to retrieve an image buffer from camera's output sample buffer."
        case .failedToCreateTextureFromImage:
            return "Failed to convert the frame to a Metal texture."
        case .failedToRetrieveTimestamp:
            return "Failed to retrieve timestamp from the sample buffer."
        }
    }
}

Then you can use a wrapper for the AVFoundation's AVCaptureDevice that has instance methods instead of the class ones (MetalCameraCaptureDevice):

import AVFoundation
internal class MetalCameraCaptureDevice {
    internal func device(mediaType: String, position: AVCaptureDevicePosition) -> AVCaptureDevice? {
        guard let devices = AVCaptureDevice.devices(withMediaType: mediaType) as? [AVCaptureDevice] else { return nil }

        if let index = devices.index(where: { $0.position == position }) {
            return devices[index]
        }
        return nil
    }
    internal func requestAccessForMediaType(_ mediaType: String!, completionHandler handler: ((Bool) -> Void)!) {
        AVCaptureDevice.requestAccess(forMediaType: mediaType, completionHandler: handler)
    }
}

Then you could have a custom viewController class to control the camera like this one (CameraViewController):

import UIKit
import Metal
internal final class CameraViewController: MTKViewController {
    var session: MetalCameraSession?
    override func viewDidLoad() {
        super.viewDidLoad()
        session = MetalCameraSession(delegate: self)
    }
    override func viewWillAppear(_ animated: Bool) {
        super.viewWillAppear(animated)
        session?.start()
    }
    override func viewDidDisappear(_ animated: Bool) {
        super.viewDidDisappear(animated)
        session?.stop()
    }
}
// MARK: - MetalCameraSessionDelegate
extension CameraViewController: MetalCameraSessionDelegate {
    func metalCameraSession(_ session: MetalCameraSession, didReceiveFrameAsTextures textures: [MTLTexture], withTimestamp timestamp: Double) {
        self.texture = textures[0]
    }
    func metalCameraSession(_ cameraSession: MetalCameraSession, didUpdateState state: MetalCameraSessionState, error: MetalCameraSessionError?) {
        if error == .captureSessionRuntimeError {
            print(error?.localizedDescription ?? "None")
            cameraSession.start()
        }
        DispatchQueue.main.async { 
            self.title = "Metal camera: \(state)"
        }
        print("Session changed state to \(state) with error: \(error?.localizedDescription ?? "None").")
    }
}

Finally your class could be like this one (MTKViewController) , where you have the

public func draw(in: MTKView)

that get you exactly the MTLTexture that you expect from the buffer camera:

import UIKit
import Metal
#if arch(i386) || arch(x86_64)
#else
    import MetalKit
#endif
open class MTKViewController: UIViewController {
    open var texture: MTLTexture?
    open func willRenderTexture(_ texture: inout MTLTexture, withCommandBuffer commandBuffer: MTLCommandBuffer, device: MTLDevice) {
    }
    open func didRenderTexture(_ texture: MTLTexture, withCommandBuffer commandBuffer: MTLCommandBuffer, device: MTLDevice) {
    }
    override open func loadView() {
        super.loadView()
#if arch(i386) || arch(x86_64)
        NSLog("Failed creating a default system Metal device, since Metal is not available on iOS Simulator.")
#else
        assert(device != nil, "Failed creating a default system Metal device. Please, make sure Metal is available on your hardware.")
#endif
        initializeMetalView()
        initializeRenderPipelineState()
    }
    fileprivate func initializeMetalView() {
#if arch(i386) || arch(x86_64)
#else
        metalView = MTKView(frame: view.bounds, device: device)
        metalView.delegate = self
        metalView.framebufferOnly = true
        metalView.colorPixelFormat = .bgra8Unorm
        metalView.contentScaleFactor = UIScreen.main.scale
        metalView.autoresizingMask = [.flexibleWidth, .flexibleHeight]
        view.insertSubview(metalView, at: 0)
#endif
    }
#if arch(i386) || arch(x86_64)
#else
    internal var metalView: MTKView!
#endif
    internal var device = MTLCreateSystemDefaultDevice()
    internal var renderPipelineState: MTLRenderPipelineState?
    fileprivate let semaphore = DispatchSemaphore(value: 1)
    fileprivate func initializeRenderPipelineState() {
        guard
            let device = device,
            let library = device.newDefaultLibrary()
        else { return }

        let pipelineDescriptor = MTLRenderPipelineDescriptor()
        pipelineDescriptor.sampleCount = 1
        pipelineDescriptor.colorAttachments[0].pixelFormat = .bgra8Unorm
        pipelineDescriptor.depthAttachmentPixelFormat = .invalid
        pipelineDescriptor.vertexFunction = library.makeFunction(name: "mapTexture")
        pipelineDescriptor.fragmentFunction = library.makeFunction(name: "displayTexture")
        do {
            try renderPipelineState = device.makeRenderPipelineState(descriptor: pipelineDescriptor)
        }
        catch {
            assertionFailure("Failed creating a render state pipeline. Can't render the texture without one.")
            return
        }
    }
}
#if arch(i386) || arch(x86_64)
#else
extension MTKViewController: MTKViewDelegate {
    public func mtkView(_ view: MTKView, drawableSizeWillChange size: CGSize) {
        NSLog("MTKView drawable size will change to \(size)")
    }
    public func draw(in: MTKView) {
        _ = semaphore.wait(timeout: DispatchTime.distantFuture)
        autoreleasepool {
            guard
                var texture = texture,
                let device = device
            else {
                _ = semaphore.signal()
                return
            }
            let commandBuffer = device.makeCommandQueue().makeCommandBuffer()
            willRenderTexture(&texture, withCommandBuffer: commandBuffer, device: device)
            render(texture: texture, withCommandBuffer: commandBuffer, device: device)
        }
    }
    private func render(texture: MTLTexture, withCommandBuffer commandBuffer: MTLCommandBuffer, device: MTLDevice) {
        guard
            let currentRenderPassDescriptor = metalView.currentRenderPassDescriptor,
            let currentDrawable = metalView.currentDrawable,
            let renderPipelineState = renderPipelineState
        else {
            semaphore.signal()
            return
        }
        let encoder = commandBuffer.makeRenderCommandEncoder(descriptor: currentRenderPassDescriptor)
        encoder.pushDebugGroup("RenderFrame")
        encoder.setRenderPipelineState(renderPipelineState)
        encoder.setFragmentTexture(texture, at: 0)
        encoder.drawPrimitives(type: .triangleStrip, vertexStart: 0, vertexCount: 4, instanceCount: 1)
        encoder.popDebugGroup()
        encoder.endEncoding()
        commandBuffer.addScheduledHandler { [weak self] (buffer) in
            guard let unwrappedSelf = self else { return }

            unwrappedSelf.didRenderTexture(texture, withCommandBuffer: buffer, device: device)
            unwrappedSelf.semaphore.signal()
        }
        commandBuffer.present(currentDrawable)
        commandBuffer.commit()
    }
}
#endif

Now you have all the sources but you can also find all the navoshta (the author) GitHUB project's here complete of all comments and descriptions about code and a great tutorial about this project here especially the second part where you can obtain the texture (you can find this code below in the MetalCameraSession class):

guard
    let unwrappedImageTexture = imageTexture,
    let texture = CVMetalTextureGetTexture(unwrappedImageTexture),
    result == kCVReturnSuccess
else {
    throw MetalCameraSessionError.failedToCreateTextureFromImage
}

This is a lot of source code from the repo but doesn't really tell me where to start looking for the fault in my code, even if this source code would fix the issue. At a glance I seem to be doing pretty much exactly what this source code does, do you have any clues as to what part is important to avoid the flickering issue? — Blixt, May 01 '17 at 22:24
I think you should focus your attention to the capture session, to be sure your buffer was not compromised by wrong thread times about for example the starting of the capture and the real status of the device: the presence of a semaphore (MTKViewController) that control the flow of the buffer is superb, ensure the correct building of the pipeline.. — Alessandro Ornano, May 01 '17 at 22:39

score -1 · Answer 4 · answered May 01 '17 at 21:29

The problem may be occurring due your camera input.If your footage is not exactly the same frame rate as your intended output,frame rate mismatch will cause weird ghosting.Try to disable the auto adjust frame rate.

Other causes of this problem may be due to the following:

CRITICAL SPEEDS: There are certain speeds that sync up with frame rates so that they cause stuttering. The lower the frame rate the more obvious the problem.

SUB PIXEL INTERPOLATION: There are also other cases where the sub pixel interpolation between frames causes areas of detail to flicker between frames.

The solution for successful rendering is to use the right speed (pixels per second) for your frame rate, add enough motion blur to hide the problem, or reduce the amount of detail in the image.

The input can't really be the issue because if I copy the buffer in the callback everything is fine. The issue only manifests itself once I get a `MTLTexture` from the buffer and try to render it later (outside the callback). I see no artifacts in the video data provided to me. — Blixt, May 01 '17 at 22:26

Holding onto a MTLTexture from a CVImageBuffer causes stuttering

4 Answers4

Linked