Decode video frames on iPhone GPU

Question

I'm looking for the fastest way to decode a local mpeg-4 video's frames on the iPhone. I'm simply interested in the luminance values of the pixels in every 10th frame. I don't need to render the video anywhere.

I've tried ffmpeg, AVAssetReader, ImageAssetGenerator, OpenCV, and MPMoviePlayer but they're all too slow. The fastest speed I can get is ~2x (2 minutes of video scanned in a minute). I'd like something closer to 10x.

Assuming my attempts above didn't utilize the GPU, is there any way to accomplish my goal with something that does run on the GPU? OpenGL seems like it's mostly for rendering output but I have seen it used as filters for incoming video. Maybe that's an option?

Thanks in advance!

score 3 · Accepted Answer · answered Feb 26 '12 at 02:18

3

If you are willing to use an iOS 5 only solution, take a look at the sample app ChromaKey from the 2011 WWDC session on AVCaputureSession.

That demo captures 30 FPS of video from the built-in camera and passes each frame to OpenGL as a texture. It then uses OpenGL to manipulate the frame, and optionally writes the result out to an output video file.

The code uses some serious low-level magic to bind a Core Video Pixel buffer from an AVCaptureSession to OpenGL so they share memory in the graphics hardware.

It should be fairly straightforward to change the AVCaptureSession to use a movie file as input rather than camera input.

You could probably set up the session to deliver frames in Y/UV form rather than RGB, where the Y component is luminance. Failing that, it would be a pretty simple matter to write a shader that would convert RGB values for each pixel to luminance values.

You should be able to do all this on ALL Frames, not just every 10th frame.

answered Feb 26 '12 at 02:18

Duncan C

128,072
22
173
272

bummer it looks like I need to be a WWDC 2011 attendee to get that sample. I still worry that effectively this is real-time transcoding. I want to get 15x speeds (15 minutes of video scanned in 1 minute). I think the bottle-neck is in the frame decoding. – simon.d Mar 02 '12 at 17:35
@simon.d - I describe the technique used in the ChromaKey example in my answer here: http://stackoverflow.com/a/9704392/19679 , and you can grab my GPUImage code to see this in action for encoding movies. I've not yet updated my movie reading code to use fast texture uploads, though. Due to the fact that iOS devices have dedicated hardware for decoding H.264, I feel reasonably certain saying that you'll not get any faster parsing for movies than using AVFoundation with the iOS 5.0 fast texture uploads. – Brad Larson Mar 22 '12 at 23:34
Apple's RosyWriter example code also demonstrates this AVCaptureSession -> OpenGL link. See [here](https://developer.apple.com/library/ios/samplecode/RosyWriter/Introduction/Intro.html). – bcattle May 27 '14 at 18:57

score 0 · Answer 2 · answered Feb 20 '12 at 00:41

0

Seemingly vImage might be appropriate, assuming you can use iOS 5. Every 10th frame seems to be within reason for using a framework like vImage. However, any type of actual real-time processing is almost certainly going to require OpenGL.

answered Feb 20 '12 at 00:41

CIFilter

8,647
4
46
66

Thanks @LucasTizma. I'll take a look at vImage. However, my goal is to have faster than real-time processing. That's why I only wanted to do every 10th frame. So imagine the video is already recorded on the phone and now I want to try scanning. Does that rule out vImage? – simon.d Feb 21 '12 at 06:55
vImage is just a means to quickly perform image processing operations. I think you'd be okay. Seemingly, other than OpenGL, this is your fastest possible solution. Others, feel free to correct me if I'm wrong. – CIFilter Feb 22 '12 at 02:00
but is vImage only useful once I've decoded the frame? If so, I'm not sure I need it. 90% of the work is actually decoding the frame, not processing the pixels. – simon.d Feb 23 '12 at 19:15

score 0 · Answer 3 · edited May 23 '17 at 12:20

Assuming the bottleneck of your application is in the code that converts the video frames to a displayable format (like RGB), you might be interested in a code I shared that was used to convert one .mp4 frame (encoded as YV12) to RGB using Qt and OpenGL. This application uploads the frame to the GPU and activates a GLSL fragment shader to do the conversion from YV12 to RGB, so it could be displayed in a QImage.

static const char *p_s_fragment_shader =
    "#extension GL_ARB_texture_rectangle : enable\n"
    "uniform sampler2DRect tex;"
    "uniform float ImgHeight, chromaHeight_Half, chromaWidth;"
    "void main()"
    "{"
    "    vec2 t = gl_TexCoord[0].xy;" // get texcoord from fixed-function pipeline
    "    float CbY = ImgHeight + floor(t.y / 4.0);"
    "    float CrY = ImgHeight + chromaHeight_Half + floor(t.y / 4.0);"
    "    float CbCrX = floor(t.x / 2.0) + chromaWidth * floor(mod(t.y, 2.0));"
    "    float Cb = texture2DRect(tex, vec2(CbCrX, CbY)).x - .5;"
    "    float Cr = texture2DRect(tex, vec2(CbCrX, CrY)).x - .5;"
    "    float y = texture2DRect(tex, t).x;" // redundant texture read optimized away by texture cache
    "    float r = y + 1.28033 * Cr;"
    "    float g = y - .21482 * Cb - .38059 * Cr;"
    "    float b = y + 2.12798 * Cb;"
    "    gl_FragColor = vec4(r, g, b, 1.0);"
    "}"

Decode video frames on iPhone GPU

3 Answers3

Linked