Seamless audio recording while flipping camera, using AVCaptureSession & AVAssetWriter

Question

I’m looking for a way to maintain a seamless audio track while flipping between front and back camera. Many apps in the market can do this, one example is SnapChat…

Solutions should use AVCaptureSession and AVAssetWriter. Also it should explicitly not use AVMutableComposition since there is a bug between AVMutableComposition and AVCaptureSession ATM. Also, I can't afford post processing time.

Currently when I change the video input the audio recording skips and becomes out of sync.

I’m including the code that could be relevant.

Flip Camera

-(void) updateCameraDirection:(CamDirection)vCameraDirection {
    if(session) {
        AVCaptureDeviceInput* currentInput;
        AVCaptureDeviceInput* newInput;
        BOOL videoMirrored = NO;
        switch (vCameraDirection) {
            case CamDirection_Front:
                currentInput = input_Back;
                newInput = input_Front;
                videoMirrored = NO;
                break;
            case CamDirection_Back:
                currentInput = input_Front;
                newInput = input_Back;
                videoMirrored = YES;
                break;
            default:
                break;
        }

        [session beginConfiguration];
        //disconnect old input
        [session removeInput:currentInput];
        //connect new input
        [session addInput:newInput];
        //get new data connection and config
        dataOutputVideoConnection = [dataOutputVideo connectionWithMediaType:AVMediaTypeVideo];
        dataOutputVideoConnection.videoOrientation = AVCaptureVideoOrientationPortrait;
        dataOutputVideoConnection.videoMirrored = videoMirrored;
        //finish
        [session commitConfiguration];
    }
}

Sample Buffer

- (void)captureOutput:(AVCaptureOutput *)captureOutput didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer fromConnection:(AVCaptureConnection *)connection {
    //not active
    if(!recordingVideo)
        return;

    //start session if not started
    if(!startedSession) {
        startedSession = YES;
        [assetWriter startSessionAtSourceTime:CMSampleBufferGetPresentationTimeStamp(sampleBuffer)];
    }

    //Process sample buffers
    if (connection == dataOutputAudioConnection) {
        if([assetWriterInputAudio isReadyForMoreMediaData]) {
            BOOL success = [assetWriterInputAudio appendSampleBuffer:sampleBuffer];
            //…
        }

    } else if (connection == dataOutputVideoConnection) {
        if([assetWriterInputVideo isReadyForMoreMediaData]) {        
            BOOL success = [assetWriterInputVideo appendSampleBuffer:sampleBuffer];
            //…
        }
    }
}

Perhaps adjust audio sample timeStamp?

I believe Snapchat uses the back camera audio even if you switch to the front facing camera. Try to keep using the audio from the back camera? — Joe Ginley, May 08 '17 at 04:50
I think I did try that but can't say for sure. Good idea, thanks. — Andres Canella, May 10 '17 at 01:36
Yeah it is worth a shot, I know I was fixing an iPhone that Siri didn't work which uses the front microphone. Interestingly enough, Snapchat would record front facing videos with audio still. Good luck, let me know what you come up with I am interested to hear! — Joe Ginley, May 10 '17 at 04:21
Thanks, will post here when I get back to this and look for a solid solutions. I ended up just retiming the audio so there's a little gap atm.. Tinypop app. — Andres Canella, May 13 '17 at 00:36

Woody Jean-louis · Answer 1 · 2019-06-20T16:14:23.373

Hey I was facing the same issue and discovered that after switching cameras the next frame was pushed far out of place. This seemed to shift every frame after that thus causing the the video and audio to be out of sync. My solution was to shift every misplaced frame to it's correct position after switching cameras.

Sorry my answer will be in Swift 4.2

You'll have to use AVAssetWriterInputPixelBufferAdaptor in order to append the sample buffers at a specify presentation timestamp.

previousPresentationTimeStamp is the presentation timestamp of the previous frame and currentPresentationTimestamp is as you guessed the presentation timestamp of the current. maxFrameDistance worked every well when testing but you can change this to your liking.

let currentFramePosition = (Double(self.frameRate) * Double(currentPresentationTimestamp.value)) / Double(currentPresentationTimestamp.timescale)
let previousFramePosition = (Double(self.frameRate) * Double(previousPresentationTimeStamp.value)) / Double(previousPresentationTimeStamp.timescale)
var presentationTimeStamp = currentPresentationTimestamp
let maxFrameDistance = 1.1
let frameDistance = currentFramePosition - previousFramePosition
if frameDistance > maxFrameDistance {
    let expectedFramePosition = previousFramePosition + 1.0
    //print("[mwCamera]: Frame at incorrect position moving from \(currentFramePosition) to \(expectedFramePosition)")

    let newFramePosition = ((expectedFramePosition) * Double(currentPresentationTimestamp.timescale)) / Double(self.frameRate)

    let newPresentationTimeStamp = CMTime.init(value: CMTimeValue(newFramePosition), timescale: currentPresentationTimestamp.timescale)

    presentationTimeStamp = newPresentationTimeStamp
}

let success = assetWriterInputPixelBufferAdator.append(pixelBuffer, withPresentationTime: presentationTimeStamp)
if !success, let error = assetWriter.error {
    fatalError(error.localizedDescription)
}

Also please note - This worked because I kept the frame rate consistent, so make sure that you have total control of the capture device's frame rate throughout this process.

I have a repo using this logic here

Good concise response! Thanks for sharing. As you can tell this is a very old post. I'm currently not working on this. But if the solution works, I'm sure it will help people that see it. — Andres Canella, Jun 20 '19 at 15:12

score 1 · Answer 2 · answered Nov 28 '20 at 01:22

I did manage to find an intermediate solution for the sync problem I found on the Woody Jean-louis solution using is repo.

The results are similar to what instagram does but it seems to work a little bit better. Basically what I do is to prevent the assetWriterAudioInput to append new samples when switching cameras. There is no way to know exactly when this happens so I figured out that before and after the switch the captureOutput method was sending video samples every 0.02 seconds +- (max 0.04 seconds).

Knowing this I created a self.lastVideoSampleDate that is updated every time a video sample is appended to assetWriterInputPixelBufferAdator and I only allow the audio sample to be appended to assetWriterAudioInput is that date is lower than 0.05.

 if let assetWriterAudioInput = self.assetWriterAudioInput,
            output == self.audioOutput, assetWriterAudioInput.isReadyForMoreMediaData {

            let since = Date().timeIntervalSince(self.lastVideoSampleDate)
            if since < 0.05 {
                let success = assetWriterAudioInput.append(sampleBuffer)
                if !success, let error = assetWriter.error {
                    print(error)
                    fatalError(error.localizedDescription)
                }
            }
        }

  let success = assetWriterInputPixelBufferAdator.append(pixelBuffer, withPresentationTime: presentationTimeStamp)
            if !success, let error = assetWriter.error {
                print(error)
                fatalError(error.localizedDescription)
            }
            self.lastVideoSampleDate = Date()

score 1 · Answer 3 · answered Mar 03 '21 at 15:28

The most 'stable way' to fix this problem - is to 'pause' recording when switching sources.

But also you can 'fill the gap' with blank video and silent audio frames. This is what I have implemented in my project.

So, create boolean to block ability to append new CMSampleBuffer's while switching cameras/microphones and reset it after some delay:

let idleTime = 1.0
self.recordingPaused = true
DispatchQueue.main.asyncAfter(deadline: .now() + idleTime) {
  self.recordingPaused = false
}
writeAllIdleFrames()

In writeAllIdleFrames method you need to calculate how many frames you need to write:

func writeAllIdleFrames() {
    let framesPerSecond = 1.0 / self.videoConfig.fps
    let samplesPerSecond = 1024 / self.audioConfig.sampleRate
    
    let videoFramesCount = Int(ceil(self.switchInputDelay / framesPerSecond))
    let audioFramesCount = Int(ceil(self.switchInputDelay / samplesPerSecond))
    
    for index in 0..<max(videoFramesCount, audioFramesCount) {
        // creation synthetic buffers
        
        recordingQueue.async {
            if index < videoFramesCount {
                let pts = self.nextVideoPTS()
                self.writeBlankVideo(pts: pts)
            }
            
            if index < audioFramesCount {
                let pts = self.nextAudioPTS()
                self.writeSilentAudio(pts: pts)
            }
        }
    }
}

How to calculate next PTS?

func nextVideoPTS() -> CMTime {
    guard var pts = self.lastVideoRawPTS else { return CMTime.invalid }
    
    let framesPerSecond = 1.0 / self.videoConfig.fps
    let delta = CMTime(value: Int64(framesPerSecond * Double(pts.timescale)),
                       timescale: pts.timescale, flags: pts.flags, epoch: pts.epoch)
    pts = CMTimeAdd(pts, delta)
    return pts
}

Tell me, if you also need code that creates blank/silent video/audio buffers :)

Hey Mike, Yes, would love to see the code that creates blank/silent video/audio buffers. Really struggling with this — Christian Ayscue, May 23 '21 at 22:41
Hey Christian. Sure. I have a repo with some of this utilities. Check it out: https://github.com/MikeSoftZP/swift-utilities/blob/master/swift-utilities/Helpers/CMSampleBuffer%2BUtilities.swift — MikeSoft, Jun 03 '21 at 10:25

Seamless audio recording while flipping camera, using AVCaptureSession & AVAssetWriter

3 Answers3

Linked