7

What's the right way to add an image overlay to a video created with AVAssetWriter?

It's possible to do so with AVAssetExportSession, but this question is about how to do so with AVAssetWriter so there is more control over the quality and output.

There are two scenarios:

1) Simple: Add single overlay that is present the entire duration of the video (similar to a watermark).

2) Complex: Add different overlays that animate in and out of the video at different times (similar to using AVVideoCompositionCoreAnimationTool).

Crashalot
  • 33,605
  • 61
  • 269
  • 439

1 Answers1

7

There's a lot of different approaches to this and the correct answer is going to depend on exactly what your use case is.

At a high level, here's three approaches:

  1. You appear to be already familiar with AVVideoCompositionCoreAnimationTool. You CAN use this with AVAssetWriter. Check out https://github.com/rs/SDAVAssetExportSession which is a drop in replacement for AVAssetExportSession that allows you to pass the AVAssetWriter settings you're seeking (because it uses AVAssetWriter internally).
  2. If you want to composite something like a WaterMark into live video (like in this question Simulate AVLayerVideoGravityResizeAspectFill: crop and center video to mimic preview without losing sharpness) then you can actually modify the sample buffer which is passed to the captureOutput function by the AVCaptureVideoDataOutputSampleBufferDelegate. The typical approach here is convert the CMSampleBuffer to a CIImage and then do whatever manipulation you like, finally convert the CIImage BACK to a CMSampleBuffer and write it out. In the question linked, the CMSampleBuffer is simply passed on without any manipulation. NB The step from CIImage back to CMSampleBuffer is relatively low level, there are lots of examples on StackOverflow however, although not many in Swift. Here's one implementation (for OSX however) Adding filters to video with AVFoundation (OSX) - how do I write the resulting image back to AVWriter?
  3. Depending on just HOW complex what you need to do is, you could look at implementing your own custom compositor by creating a class that complies with https://developer.apple.com/library/mac/documentation/AVFoundation/Reference/AVVideoCompositing_Protocol/ and you reference in the AVVideoComposition. This is complex and (probably) overkill - if you don't know why you need to do this, then you probably don't need one. If you start struggling with problems like "how can I have multiple animation layers on different tracks in my video and not all on one track" or "how can I rotate, scale and animate moving video within an image frame - like a polaroid that spins in while the video is playing in the frame"... well this is what you need to look into.

If you need some further info, then if you add some clarification on what you're trying to do, I may be able to expand this answer to add more detail on the appropriate approach.

Community
  • 1
  • 1
Tim Bull
  • 2,375
  • 21
  • 25
  • Thanks again, Tim! Will look into these and report. If the watermark doesn't need to be live, i.e., user captures video then we add the watermark, it sounds like #1 would work best? Is there a drawback to #1 or a reason to choose #2 over #1 for non-live overlays? – Crashalot Feb 16 '16 at 19:13
  • No technical reason, just a user experience one. If your app is just producing the video and you're only doing post-processing to add a watermark, IMO it's a better UX to do it all at the same time (use #2). If you're doing some more complex post processing with the recording made by the user, then #1 is a good choice. – Tim Bull Feb 16 '16 at 19:47
  • OK thanks ... to clarify you're basically saying #2 will produce the video faster than #1 so the user waits for less time? – Crashalot Feb 16 '16 at 19:51
  • Yup, exactly. #2 is "real time" or close enough for the user, #1 will have a post-processing time which will be less than the recording time, but still significant. It really depends on what you're doing, what the user expects and how you present it. Not a technical problem, rather a UX question that will help inform the technical choice. – Tim Bull Feb 16 '16 at 19:53
  • 1
    Cool, this is so helpful. Will try #1 first since it's easier and the post-processing is simple for one use case. Adding different multiple overlays (for single track) seems like it requires #3, unfortunately. Wishing you all the luck with Mixbit. Your SO karma is like 3 orders of magnitude too low; you deserve much higher! – Crashalot Feb 16 '16 at 19:57
  • You can add multiple overlays with #1 no problem. You just can't mix and match multiple video tracks within the overlays. If you're not doing this, then #1 is a pretty good solution. The other trick to be aware of with #1 is never start a CoreAnimation at kCMTimeZero when adding it to an AnimationTool. – Tim Bull Feb 16 '16 at 20:05
  • 1
    OK cool, thanks again! Nope, only need a single video track for both use cases. So using an AnimationTool won't affect the quality? The quality is solely dictated by settings in the AVAssetWriter? Also, it takes about 10-20 seconds with AVAssetExportSession to animate multiple overlays (not a single watermark) into a 60-second user video. Will this performance improve with #1 by using AVAssetWriter, or will we need some hybrid of #2? – Crashalot Feb 16 '16 at 20:13
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/103645/discussion-between-tim-bull-and-crashalot). – Tim Bull Feb 16 '16 at 21:03
  • Tim, when toggling the camera did you have to remove/add outputs as well as the inputs and redefine the sessionPreset? Everything seems okay (though slight blurriness when user taps the screen to start/stop recording), but toggling the camera causes problems, one of which is the screen turns black. – Crashalot Feb 22 '16 at 19:53
  • Does somebody find the option for #1 to work faster. Adding watermark process with #1 takes about half the time of recording that video. – Klemen May 05 '16 at 21:08
  • #1 is definitely faster. Less manipulation of the buffers I suspect which is a slow process (comparatively). – Tim Bull May 05 '16 at 21:15
  • Hi Tim, if the text overlay only changes every 5 seconds or so, how would you recommend doing approach #2? Is the only option to add the overlay to every frame, or is there a way to "keep" the overlay in the video and only do frame manipulation every 5 seconds or so when the text changes? From other questions, it seems like changing the overlay for every frame causes lag in the video. – Crashalot Dec 15 '17 at 02:41
  • Does the first solution only work for an already recorded video? If I want to add text while recording video then should I use the second solution? I record video with `AssetWriter` using `CMSampleBuffer` callbacks from inputs (for video and audio devices) – user924 Mar 01 '18 at 09:46
  • 2 - solution is very slow - https://stackoverflow.com/questions/49066195/what-is-the-best-way-to-record-a-video-with-augmented-reality, 1- isn't for dynamic adding? – user924 Mar 02 '18 at 09:38