Best Practice for iOS video processing

Question

I've been doing research on the best way to do video processing on iOS using the latest technologies and have gotten a few different results. It seems there's ways to do this with Core Image, OpenGL, and some open source frameworks as well. I'd like to steer away from the open source options just so that I can learn what's going on behind the scenes, so the question is:

What is my best option for processing (filters, brightness, contrast, etc.) a pre-recorded video on iOS?

I know Core Image has a lot of great built in filters and has a relatively simple API, but I haven't found any resources on how to actually break down a video into images and then re-encode them. Any help on this topic would be extremely useful, thanks.

score 8 · Accepted Answer · edited May 23 '17 at 12:00

As you state, you have several options for this. Whichever you regard as "best" will depend on your specific needs.

Probably your simplest non-open-source route would be to use Core Image. Getting the best performance out of Core Image video filtering will still take a little work, since you'll need to make sure you're doing GPU-side processing for that.

In a benchmark application I have in my GPUImage framework, I have code that uses Core Image in an optimized manner. To do so, I set up AV Foundation video capture and create a CIImage from the pixel buffer. The Core Image context is set to render to an OpenGL ES context, and the properties on that (colorspace, etc.) are set to render quickly. The settings I use there are ones suggested by the Core Image team when I talked to them about this.

Going the raw OpenGL ES route is something I talk about here (and have a linked sample application there), but it does take some setup. It can give you a little more flexibility than Core Image because you can write completely custom shaders to manipulate images in ways that you might not be able to in Core Image. It used to be that this was faster than Core Image, but there's effectively no performance gap nowadays.

However, building your own OpenGL ES video processing pipeline isn't simple, and it involves a bunch of boilerplate code. It's why I wrote this, and I and others have put a lot of time into tuning it for performance and ease of use. If you're concerned about not understanding how this all works, read through the GPUImageVideo class code within that framework. That's what pulls frames from the camera and starts the video processing operation. It's a little more complex than my benchmark application, because it takes in YUV planar frames from the camera and converts those to RGBA in shaders in most cases, instead of grabbing raw RGBA frames. The latter is a little simpler, but there are performance and memory optimizations to be had with the former.

All of the above was talking about live video, but prerecorded video is much the same, only with a different AV Foundation input type. My GPUImageMovie class has code within it to take in prerecorded movies and process individual frames from that. They end up in the same place as frames you would have captured from a camera.

Thank you for your thorough answer! I have actually looked into your GPUImage framework previously and it's amazing, great job! It is something I would use in future projects, however right now since this is the first time I've worked with processing videos I'd like to see how it works behind the scenes to hopefully better understand frameworks like yours. Also, I am using Swift for my project and would like to continue doing so. Thanks again for your answer — tnev, Aug 16 '15 at 15:21
After having used Brad's GPUImage library now for multiple projects over the last 2 years I can say that it is one of the best places to start if you're in the situation that my question describes. Understanding what goes on "under the hood" would likely require many months or years of learning and understanding AVFoundation and OpenGL. Not for the faint of heart. If you want to do image / video processing in iOS I highly recommend reading Brad's answer above. — tnev, Oct 16 '17 at 21:41
@Brad Thank you for your answer here. Would you be willing to describe what are pros and cons of using your framework vs Metal for example to process videos? — Stefan Vasiljevic, Mar 17 '19 at 02:25
@StefanVasiljevic - Well, hopefully with this new version of the framework, that won't be an either / or proposition: https://github.com/BradLarson/GPUImage3 . There's still some more work left to be done there, and we got a little sidetracked in recent months. Outside of that, Core Image uses Metal under the hood now, so that's another wrapper where you don't need to worry about the underlying calculations. If you do want to go to Metal directly, my comments about OpenGL will also apply there. There's a lot of boilerplate required to set up basic rendering, but you have the most flexibility. — Brad Larson, Mar 23 '19 at 15:42

Best Practice for iOS video processing

1 Answers1