4

I have done a ton of research, and haven't yet been able to find a viable solution, for many reasons, which I will outline below.


Problem

In my iOS app, I want three views that indefinitely show a delayed-live preview of the device's camera.

For example, view 1 will show a camera view, delayed for 5s, view 2 will show the same camera view, delayed for 20s, and view 3 will show the same camera view, delayed 30s.

This would be used to record yourself performing some kind of activity, such as a workout exercise, and then watch yourself a few seconds later in order to perfect your form of a given exercise.

Solutions Tried

I have tried and researched a couple different solutions, but all have problems.

1. Using AVFoundation and AVCaptureMovieFileOutput:

  • Use AVCaptureSession and AVCaptureMovieFileOutput to record short clips to device storage. Short clips are required because you cannot play video from a URL, and write to that same URL simultaneously.
  • Have 3 AVPlayer and AVPlayerLayer instances, all playing the short recorded clips at their desired time-delays.
  • Problems:
    1. When switching clips using AVPlayer.replaceCurrentItem(_:), there is a very noticeable delay between clips. This needs to be a smooth transition.
    2. Although old, a comment here suggests not to create multiple AVPlayer instances due to a device limit. I haven't been able to find information confirming or denying this statement. E: From Jake G's comment - 10 AVPlayer instances is okay for an iPhone 5 and newer.

2. Using AVFoundation and AVCaptureVideoDataOutput:

  • Use AVCaptureSession and AVCaptureVideoDataOutput to stream and process each frame of the camera's feed using the didOutputSampleBuffer delegate method.
  • Draw each frame on an OpenGL view (such as GLKViewWithBounds). This solves the problem of multiple AVPlayer instances from Solution 1..
  • Problem: Storing each frame so they can be displayed later requires copious amounts of memory (which just isn't viable on an iOS device), or disk space. If I want to store a 2 minute video at 30 frames per second, that's 3600 frames, totalling over 12GB if copied directly from didOutputSampleBuffer. Maybe there is a way to compress each frame x1000 without losing quality that would allow me to keep this data in memory. If such a method exists, I haven't been able to find it.

Possible 3rd Solution

If there is a way to read and write to a file simultaneously, I believe the following solution would be ideal.

  • Record video as a circular stream. For example, for a video buffer of 2 minutes, I would create a file output stream that will write frames for two minutes. Once the 2 minute mark is hit, the stream will restart from the beginning, overriding the original frames.
  • With this file output stream constantly running, I would have 3 input streams on the same recorded video file. Each stream would point to a different frame in the stream (effectively X seconds behind the writing stream). Then each frame would be displayed on the input streams respective UIView.

Of course, this still has an issue of storage space. Event if frames were stored as compressed JPEG images, we're talking about multiple GBs of storage required for a lower quality, 2 minute video.

Question

  1. Does anyone know of an efficient method to achieve what I want?
  2. How can I fix some of the problems in the solutions I've already tried?
cohenadair
  • 2,072
  • 1
  • 22
  • 38
  • 2
    Regarding AVPlayer device limit, on iPhone 5 and newer you should be able to have 10 players (video channels, really) allocated simultaneously without an issue. – Jake G Feb 20 '18 at 19:10
  • @cohenadair what did you end up selecting? – denfromufa Mar 26 '21 at 02:49
  • 2
    @denfromufa, a combination of all 3 solutions, actually. I ended up creating a circular file storage buffer of short clips and displaying them in sequence using OpenGL. It ended up working quite well. If you want to see the end result, the app is free on the App Store: https://apps.apple.com/us/app/xlr8-skill-system/id1353246743 – cohenadair Mar 27 '21 at 12:54
  • @cohenadair cool, checkout the new answer with new API below: https://stackoverflow.com/a/66829118/2230844 – denfromufa Mar 28 '21 at 02:29

2 Answers2

6

Things have changed since the accepted answer. There is now an alternative to a segmented AVCaptureMovieFileOutput that doesn't drop frames on iOS when you create new segments and that alternative is AVAssetWriter!

As of iOS 14, AVAssetWriter can create fragmented MPEG4 which are essentially mpeg 4 files in memory. Intended for HLS streaming applications, it is however also an incredibly convenient method for caching video and audio content.

This new capability is described by Takayuki Mizuno in the WWDC 2020 session Author fragmented MPEG-4 content with AVAssetWriter.

With a fragmented mp4 AVAssetWriter in hand, it is not too hard to create a solution to this problem by writing the mp4 segments to disk and playing them back with the desired time offsets using several AVQueuePlayers.

So this would be a fourth solution: capture the camera stream and write it to disk as fragmented mp4 using AVAssetWriter's .mpeg4AppleHLS output profile and play the video back with differing delays using AVQueuePlayers and AVPlayerLayers.

If you need to support iOS 13 and below you'll have to replace the segmented AVAssetWriter, which gets technical quickly, especially if you want to write audio too. Thanks, Takayuki Mizuno!

import UIKit
import AVFoundation
import UniformTypeIdentifiers

class ViewController: UIViewController {
    let playbackDelays:[Int] = [5, 20, 30]
    let segmentDuration = CMTime(value: 2, timescale: 1)

    var assetWriter: AVAssetWriter!
    var videoInput: AVAssetWriterInput!
    var startTime: CMTime!

    var writerStarted = false
    
    let session = AVCaptureSession()
    
    var segment = 0
    var outputDir: URL!
    var initializationData = Data()
    
    var layers: [AVPlayerLayer] = []
    var players: [AVQueuePlayer] = []

    override func viewDidLoad() {
        super.viewDidLoad()
        
        for _ in 0..<playbackDelays.count {
            let player = AVQueuePlayer()
            player.automaticallyWaitsToMinimizeStalling = false
            let layer = AVPlayerLayer(player: player)
            layer.videoGravity = .resizeAspectFill
            layers.append(layer)
            players.append(player)
            view.layer.addSublayer(layer)
        }
        
        outputDir = FileManager.default.urls(for: .documentDirectory, in:.userDomainMask).first!
    
        assetWriter = AVAssetWriter(contentType: UTType.mpeg4Movie)
        assetWriter.outputFileTypeProfile = .mpeg4AppleHLS // fragmented mp4 output!
        assetWriter.preferredOutputSegmentInterval = segmentDuration
        assetWriter.initialSegmentStartTime = .zero
        assetWriter.delegate = self
        
        let videoOutputSettings: [String : Any] = [
            AVVideoCodecKey: AVVideoCodecType.h264,
            AVVideoWidthKey: 1024,
            AVVideoHeightKey: 720
        ]
        videoInput = AVAssetWriterInput(mediaType: .video, outputSettings: videoOutputSettings)
        videoInput.expectsMediaDataInRealTime = true

        assetWriter.add(videoInput)

        // capture session
        let videoDevice = AVCaptureDevice.default(for: .video)!
        let videoInput = try! AVCaptureDeviceInput(device: videoDevice)
        session.addInput(videoInput)
        
        let videoOutput = AVCaptureVideoDataOutput()
        videoOutput.setSampleBufferDelegate(self, queue: DispatchQueue.main)
        session.addOutput(videoOutput)
        
        session.startRunning()
    }
    
    override func viewDidLayoutSubviews() {
        let size = view.bounds.size
        let layerWidth = size.width / CGFloat(layers.count)
        for i in 0..<layers.count {
            let layer = layers[i]
            layer.frame = CGRect(x: CGFloat(i)*layerWidth, y: 0, width: layerWidth, height: size.height)
        }
    }
    
    override var supportedInterfaceOrientations: UIInterfaceOrientationMask {
        return .landscape
    }
}

extension ViewController: AVCaptureVideoDataOutputSampleBufferDelegate {
    func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
        
        if startTime == nil {
            let success = assetWriter.startWriting()
            assert(success)
            startTime = sampleBuffer.presentationTimeStamp
            assetWriter.startSession(atSourceTime: startTime)
        }
        
        if videoInput.isReadyForMoreMediaData {
            videoInput.append(sampleBuffer)
        }
    }
}

extension ViewController: AVAssetWriterDelegate {
    func assetWriter(_ writer: AVAssetWriter, didOutputSegmentData segmentData: Data, segmentType: AVAssetSegmentType) {
        print("segmentType: \(segmentType.rawValue) - size: \(segmentData.count)")
        
        switch segmentType {
        case .initialization:
            initializationData = segmentData
        case .separable:
            let fileURL = outputDir.appendingPathComponent(String(format: "%.4i.mp4", segment))
            segment += 1

            let mp4Data = initializationData + segmentData
            try! mp4Data.write(to: fileURL)

            let asset = AVAsset(url: fileURL)

            for i in 0..<players.count {
                let player = players[i]
                let playerItem = AVPlayerItem(asset: asset)
                player.insert(playerItem, after: nil)
                
                if player.rate == 0 && player.status == .readyToPlay {
                    let hostStartTime: CMTime = startTime + CMTime(value: CMTimeValue(playbackDelays[i]), timescale: 1)

                    player.preroll(atRate: 1) { prerolled in
                        guard prerolled else { return }
                        player.setRate(1, time: .invalid, atHostTime: hostStartTime)
                    }
                }
            }
            
        @unknown default:
            break
        }
    }
}

The result looks like this

4 clocks visible, 1 in background, 3 on ipad, with delays of 5, 20 and 30 seconds between them

and the performance is reasonable: my 2019 iPod is sitting in 10-14% cpu and 38MB of memory.

Rhythmic Fistman
  • 34,352
  • 5
  • 87
  • 159
  • does this work with different playback speeds? – denfromufa Mar 28 '21 at 18:02
  • 1
    Sure does, just pass a different rate in `preroll` and `setRate`. Of course rates <= 1 work best, as > 1 rates will eat up your delays and overtake "now". I now realise that the screenshot should have been of two devices and _four_ clocks. I'll see if I can do a better one. – Rhythmic Fistman Mar 28 '21 at 20:29
  • 1
    This is amazing! I've used a very old (but working) setup where I wrote images myself to the document directory with tons of code, this is so much cleaner and up-to-date. Your code runs immediately, however, the final video that it displays is rotated 90 degrees. I wonder why your screenshot doesn't show that... Do you have an idea? – Bob de Graaf Sep 28 '21 at 08:13
  • 1
    No idea, Bob - I probably hacked the test code for portrait, is that what you tried? – Rhythmic Fistman Sep 28 '21 at 08:20
  • Yeah I've tried several things now but I can't seem to figure it out yet. It's interesting that your app is in portrait because in the code you override var supportedInterfaceOrientations to only landscape... But what I also see is that in portrait (even though the gravity is set to aspectFill) the size is actually more wide than it should be. But on landscape the size is correct. Is that why you forced landscape? – Bob de Graaf Sep 28 '21 at 09:12
  • 1
    Probably - the UI code took a back seat to the media processing in this answer. – Rhythmic Fistman Sep 28 '21 at 09:14
  • Ah ok. Even if I force my App to only support portrait it's still rotated though. I'll try out some more things and see if I can get it working, I'll post something here if I can figure it out. Thanks for your help and answers! – Bob de Graaf Sep 28 '21 at 09:15
  • 1
    Hm I really can't get it work, I've been trying all day. I also have the issue that the camera feedback is too wide. I've been looking closer at your screenshot but I think it's the same there as well. If you look closely at the numbers in your real-life clock you can see they are smaller there, you can see it the best for number 8. I think I'm going to ask a new question with a bounty, it's driving me crazy ;) – Bob de Graaf Sep 28 '21 at 15:49
  • Please link to the new question here! – Rhythmic Fistman Sep 28 '21 at 16:32
  • 1
    I finally got it working here. Firstly, the current code is not setting the session preset to e.g. 480 x 640. If you do that and set the videoOutputSettings to the same values, the width & height will be perfect. BUT, now there's still the change of orientation to fix. The issue here is that you CAN'T change the videoOutputSettings once the AVAssetWriter started writing. So I simply put those in a different class, and recreate the whole session once it rotates. It works perfectly now :) – Bob de Graaf Oct 12 '21 at 13:22
  • you should post your answer! – Rhythmic Fistman Oct 19 '21 at 13:04
2
  1. on iOS AVCaptureMovieFileOutput drops frames when switching files. On osx this doesn't happen. There's a discussion around this in the header file, see captureOutputShouldProvideSampleAccurateRecordingStart.

A combination of your 2. and 3. should work. You need to write the video file in chunks using AVCaptureVideoDataOutput and AVAssetWriter instead of AVCaptureMovieFileOutput so you don't drop frames. Add 3 ring buffers with enough storage to keep up with playback, use GLES or metal to display your buffers (use YUV instead of RGBA use 4/1.5 times less memory).

I tried a more modest version of this back in the days of the mighty iPhone 4s and iPad 2. It showed (I think) now and 10s in the past. I guestimated that because you could encode 30fps at 3x realtime, that I should be able to encode the chunks and read the previous ones using only 2/3 of the hardware capacity. Sadly, either my idea was wrong or there was a non-linearity with the hardware, or the code was wrong and the encoder kept falling behind.

Rhythmic Fistman
  • 34,352
  • 5
  • 87
  • 159
  • The problem here is there's not enough memory to have 1 ring buffer (let alone 3), even if the frames are stored in a smaller format, like YUV (I tested by converting frames to YUV images and storing in memory; this may be the wrong approach). I've managed to save my videos in chunks, and play them back using multiple `AVPlayer` instances, but that method eventually lags behind recording (so the playback starts at a 5 second delay, but 10 minutes later it's playing at a 20 second delay). – cohenadair Feb 27 '18 at 20:17
  • How are you choosing your ring buffer size? – Rhythmic Fistman Feb 27 '18 at 20:20
  • I didn't choose any size. As far as I know, you can't allocate memory in Swift like you would in C. I was never able to store more than a couple seconds of frames at 20 fps before the app crashed. The problem likely stems from `AVAssetReader` not being able to produce compressed frames, so everything being read is uncompressed. – cohenadair Feb 27 '18 at 23:37
  • Yet video playback is possible! You need to figure out how many of seconds of decompressed frames you need to sustain playback, how much memory is available to you and what frame resolution that implies. Using YUV will lower your memory requirements, increasing your resolution. I wouldn't worry about C and Swift differences. – Rhythmic Fistman Feb 27 '18 at 23:42
  • But `AVAssetReader` doesn't allow you to get decompressed frames. At least, no way that I've found during my research. Video playback is certainly possible directly from a local or remote URL, but storing a buffer of video frames of any decent size in memory is proving to be very difficult. – cohenadair Feb 27 '18 at 23:49
  • What’s a decent size? You haven’t calculated how many frames you need. Maybe it’s a small number. – Rhythmic Fistman Feb 27 '18 at 23:52
  • I want to playback a maximum delay of two minutes. A buffer of that size for 30 fps is `30 * 120 seconds = 3600`. If I can get compressed frames, that might actually be possible, but using `AVAssetReader` can't store more than a couple seconds worth of frames without crashing. – cohenadair Feb 27 '18 at 23:57
  • Yes, but who says you need more than a few seconds of frames in memory at once? The answer said to store the video as chunks in files. Remember, the device can decode at least as fast as real-time, so in an ideal world you’d only need a buffer of 1 frame. In practice you’ll need more than that. When you find out how many for was it 3 streams ? you’ll know if your idea is viable or not. And you can always reduce the dimensions of your video (it doesn’t have to be 1080) until the app becomes viable, assuming the decoding/encoding process itself doesn’t start dominating the process. – Rhythmic Fistman Feb 28 '18 at 07:37
  • Oh, of course, I guess you wouldn't need to load it all at once. The next issue, then -- how do you play the frames back at the same frame rate they were recorded? `AVAssetReader` reads them back much faster than what they were recorded (a 2.5s video at 20 fps loads at ~0.7s)? I know very little about OpenGL/Metal -- can you set a frame rate when rendering a view? – cohenadair Feb 28 '18 at 14:32
  • 1
    Frames have presentation timestamps, the time at which time they should appear. If you have frames _f0_ and _f1_, then you know the interval during which _f0_ should be displayed. Similarly, your ring buffers represent time intervals too, and for each view, delayed or live, you look up the frame for that time and draw it in GL/Metal in their draw callbacks, which happen at a fixed rate - usually the screen refresh rate. – Rhythmic Fistman Feb 28 '18 at 14:42