26

At a high level, I created an app that lets a user point his or her iPhone camera around and see video frames that have been processed with visual effects. Additionally, the user can tap a button to take a freeze-frame of the current preview as a high-resolution photo that is saved in their iPhone library.

To do this, the app follows this procedure:

1) Create an AVCaptureSession

captureSession = [[AVCaptureSession alloc] init];
[captureSession setSessionPreset:AVCaptureSessionPreset640x480];

2) Hook up an AVCaptureDeviceInput using the back-facing camera.

videoInput = [[[AVCaptureDeviceInput alloc] initWithDevice:backFacingCamera error:&error] autorelease];
[captureSession addInput:videoInput];

3) Hook up an AVCaptureStillImageOutput to the session to be able to capture still frames at Photo resolution.

stillOutput = [[AVCaptureStillImageOutput alloc] init];
[stillOutput setOutputSettings:[NSDictionary
    dictionaryWithObject:[NSNumber numberWithInt:kCVPixelFormatType_32BGRA]
    forKey:(id)kCVPixelBufferPixelFormatTypeKey]];
[captureSession addOutput:stillOutput];

4) Hook up an AVCaptureVideoDataOutput to the session to be able to capture individual video frames (CVImageBuffers) at a lower resolution

videoOutput = [[AVCaptureVideoDataOutput alloc] init];
[videoOutput setVideoSettings:[NSDictionary dictionaryWithObject:[NSNumber numberWithInt:kCVPixelFormatType_32BGRA] forKey:(id)kCVPixelBufferPixelFormatTypeKey]];
[videoOutput setSampleBufferDelegate:self queue:dispatch_get_main_queue()];
[captureSession addOutput:videoOutput];

5) As video frames are captured, the delegate's method is called with each new frame as a CVImageBuffer:

- (void)captureOutput:(AVCaptureOutput *)captureOutput didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer fromConnection:(AVCaptureConnection *)connection
{
    CVImageBufferRef pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer);
    [self.delegate processNewCameraFrame:pixelBuffer];
}

6) Then the delegate processes/draws them:

- (void)processNewCameraFrame:(CVImageBufferRef)cameraFrame {
    CVPixelBufferLockBaseAddress(cameraFrame, 0);
    int bufferHeight = CVPixelBufferGetHeight(cameraFrame);
    int bufferWidth = CVPixelBufferGetWidth(cameraFrame);

    glClear(GL_COLOR_BUFFER_BIT);

    glGenTextures(1, &videoFrameTexture_);
    glBindTexture(GL_TEXTURE_2D, videoFrameTexture_);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);

    glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, bufferWidth, bufferHeight, 0, GL_BGRA, GL_UNSIGNED_BYTE, CVPixelBufferGetBaseAddress(cameraFrame));

    glBindBuffer(GL_ARRAY_BUFFER, [self vertexBuffer]);
    glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, [self indexBuffer]);

    glDrawElements(GL_TRIANGLE_STRIP, 4, GL_UNSIGNED_SHORT, BUFFER_OFFSET(0));

    glBindBuffer(GL_ARRAY_BUFFER, 0);
    glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, 0);
    [[self context] presentRenderbuffer:GL_RENDERBUFFER];

    glDeleteTextures(1, &videoFrameTexture_);

    CVPixelBufferUnlockBaseAddress(cameraFrame, 0);
}

This all works and leads to the correct results. I can see a video preview of 640x480 processed through OpenGL. It looks like this:

640x480 Correct Preview

However, if I capture a still image from this session, its resolution will also be 640x480. I want it to be high resolution, so in step one I change the preset line to:

[captureSession setSessionPreset:AVCaptureSessionPresetPhoto];

This correctly captures still images at the highest resolution for the iPhone4 (2592x1936).

However, the video preview (as received by the delegate in steps 5 and 6) now looks like this:

Photo preview incorrect

I've confirmed that every other preset (High, medium, low, 640x480, and 1280x720) previews as intended. However, the Photo preset seems to send buffer data in a different format.

I've also confirmed that the data being sent to the buffer at the Photo preset is actually valid image data by taking the buffer and creating a UIImage out of it instead of sending it to openGL:

CGColorSpaceRef colorSpace = CGColorSpaceCreateDeviceRGB();
CGContextRef context = CGBitmapContextCreate(CVPixelBufferGetBaseAddress(cameraFrame), bufferWidth, bufferHeight, 8, bytesPerRow, colorSpace, kCGBitmapByteOrder32Little | kCGImageAlphaPremultipliedFirst); 
CGImageRef cgImage = CGBitmapContextCreateImage(context); 
UIImage *anImage = [UIImage imageWithCGImage:cgImage];

This shows an undistorted video frame.

I've done a bunch of searching and can't seem to fix it. My hunch is that it's a data format issue. That is, I believe that the buffer is being set correctly, but with a format that this line doesn't understand:

glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, bufferWidth, bufferHeight, 0, GL_BGRA, GL_UNSIGNED_BYTE, CVPixelBufferGetBaseAddress(cameraFrame));

My hunch was that changing the external format from GL_BGRA to something else would help, but it doesn't... and through various means it looks like the buffer is actually in GL_BGRA.

Does anyone know what's going on here? Or do you have any tips on how I might go about debugging why this is happening? (What's super weird is that this happens on an iphone4 but not on an iPhone 3GS ... both running ios4.3)

genpfault
  • 51,148
  • 11
  • 85
  • 139
sotangochips
  • 2,700
  • 6
  • 28
  • 38
  • Thank you for your question and answer. It was really helpful! I want to add little correction. **AVCaptureStillImageOutput** should have this output settings to capture image: `[NSDictionary dictionaryWithObjectsAndKeys:AVVideoCodecJPEG, AVVideoCodecKey, nil]` – Martin Pilch Feb 21 '12 at 12:53

8 Answers8

13

This was a doozy.

As Lio Ben-Kereth pointed out, the padding is 48 as you can see from the debugger

(gdb) po pixelBuffer
<CVPixelBuffer 0x2934d0 width=852 height=640 bytesPerRow=3456 pixelFormat=BGRA
# => 3456 - 852 * 4 = 48

OpenGL can compensate for this, but OpenGL ES cannot (more info here openGL SubTexturing)

So here is how I'm doing it in OpenGL ES:

(CVImageBufferRef)pixelBuffer   // pixelBuffer containing the raw image data is passed in

/* ... */
glActiveTexture(GL_TEXTURE0);
glBindTexture(GL_TEXTURE_2D, videoFrameTexture_);

int frameWidth = CVPixelBufferGetWidth(pixelBuffer);
int frameHeight = CVPixelBufferGetHeight(pixelBuffer);

size_t bytesPerRow, extraBytes;

bytesPerRow = CVPixelBufferGetBytesPerRow(pixelBuffer);
extraBytes = bytesPerRow - frameWidth*4;

GLubyte *pixelBufferAddr = CVPixelBufferGetBaseAddress(pixelBuffer);

if ( [[captureSession sessionPreset] isEqualToString:@"AVCaptureSessionPresetPhoto"] )
{

    glTexImage2D( GL_TEXTURE_2D, 0, GL_RGBA, frameWidth, frameHeight, 0, GL_BGRA, GL_UNSIGNED_BYTE, NULL );

    for( int h = 0; h < frameHeight; h++ )
    {
        GLubyte *row = pixelBufferAddr + h * (frameWidth * 4 + extraBytes);
        glTexSubImage2D( GL_TEXTURE_2D, 0, 0, h, frameWidth, 1, GL_BGRA, GL_UNSIGNED_BYTE, row );
    }
}
else
{
    glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, frameWidth, frameHeight, 0, GL_BGRA, GL_UNSIGNED_BYTE, pixelBufferAddr);
}

Before, I was using AVCaptureSessionPresetMedium and getting 30fps. In AVCaptureSessionPresetPhoto I'm getting 16fps on an iPhone 4. The looping for the sub-texture does not seem to affect the frame rate.

I'm using an iPhone 4 on iOS 5.

Community
  • 1
  • 1
Dex
  • 12,527
  • 15
  • 69
  • 90
  • It works fine, but the result is a little blurry... I don't know if it is asking too much, but is there any way to improve the image quality? – Carles Estevadeordal Jan 17 '12 at 12:59
  • Are you using photo or video? The video is always a bit noisy, but there shouldn't be any problems as a result of this code, it is just processing the raw bitmaps. If you do want to take away the blurring, you'll need to apply a sharpening filter. – Dex Jan 19 '12 at 02:14
  • I should point out that tastyone's newer solution also appears to correct for this behavior on the iPhone 4 and 4S with the photo preset, while avoiding the moderately expensive loop here. Also, I've found that the iOS 5.0 texture caches don't exhibit these artifacts, so you could use those for even better video performance. – Brad Larson Apr 01 '12 at 03:23
  • Nice. There is also a 1-2fps improvement in my tests by using @tastyone's method in iOS 5. – Dex Apr 01 '12 at 07:05
  • If you're on iOS 5.0, I highly recommend looking into the texture caches, which aren't affected by this issue and are much faster than using `glTexImage2D()`: http://stackoverflow.com/a/9574798/19679 . I also rolled all of this into my GPUImage framework, which provides a nice abstraction for accelerated image and video processing. – Brad Larson Apr 03 '12 at 17:19
5

Just draw like this.

size_t bytesPerRow = CVPixelBufferGetBytesPerRow(pixelBuffer);
int frameHeight = CVPixelBufferGetHeight(pixelBuffer);

GLubyte *pixelBufferAddr = CVPixelBufferGetBaseAddress(pixelBuffer);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, (GLsizei)bytesPerRow / 4, (GLsizei)frameHeight, 0, GL_BGRA, GL_UNSIGNED_BYTE, pixelBufferAddr);
Brad Larson
  • 170,088
  • 45
  • 397
  • 571
Sangwon Park
  • 454
  • 5
  • 4
  • 2
    I can confirm that this works for video feeds from an iPhone 4 and 4S on iOS 5.0 with the AVCaptureSessionPresetPhoto preset. I needed to correct the above to add a division by four in order to put this in terms of pixels. Otherwise, you got a segmentation fault. – Brad Larson Apr 01 '12 at 03:20
2

The sessionPresetPhoto is the setting for capturing a photo with highest quality. When we use AVCaptureStillImageOutput with preset photo, the frame captured from video stream has always exactly the resolution of the iPad or iPhone screen. I have had the same problem with iPad Pro 12.9 inch which has a 2732 * 2048 resolution. That means the frame I captured from video stream was 2732 * 2048 but it was always distorted and shifted. I tried above mentioned solutions but it did not solve my problem. Finally, I realised that the width of the frame should always be divisible to 8 which 2732 is not. 2732/8 = 341.5. So what I did was to calculate the modulo of width and 8. If modulo is not equal to zero then I add it to the width. In this case 2732%8 = 4 and then I get 2732+4 = 2736. So I will set this frame width in CVPixelBufferCreate in order to initialise my pixelBuffer(CVPixelBufferRef).

Tatsuyuki Ishi
  • 3,883
  • 3
  • 29
  • 41
2

Good point Mats. But as a matter of fact the padding is larger, it's:

bytesPerRow = 4 * bufferWidth + 48;

It works great on the iphone 4 back camera, and solved the issue sotangochips reported about.

  • This seems to work for me reading a PNG from the file system that I ha created through photoshop as part of the app. Im not 100% sure why but it seems to work perfectly which makes me happy! – James Hornitzky Sep 23 '14 at 14:07
1

Dex, thanks for the excellent answer. To make your code more generic, I would replace:

if ( [[captureSession sessionPreset] isEqualToString:@"AVCaptureSessionPresetPhoto"] )

with

if ( extraBytes > 0 )
chown
  • 51,908
  • 16
  • 134
  • 170
greg_p
  • 315
  • 1
  • 8
  • The only thing is, we are assuming all video modes will be packed the same way. iOS 5 has added some additional video modes and I have no idea if there are any quirks associated with them as well. – Dex Nov 07 '11 at 00:38
1

I think I found your answer, and I'm sorry because it is no good news.

You can check this link: http://developer.apple.com/library/mac/#documentation/AudioVideo/Conceptual/AVFoundationPG/Articles/04_MediaCapture.html

Configuring a Session

Symbol: AVCaptureSessionPresetPhoto
Resolution: Photo.
Comments: Full photo resolution. This is not supported for video output.

sth
  • 222,467
  • 53
  • 283
  • 367
Jwlyan
  • 84
  • 1
  • 3
0

Use this size evereywhere in your code

 int width_16 = (int)yourImage.size.width - (int)yourImage.size.width%16; 
 int height_ = (int)(yourImage.size.height/yourImage.size.width * width_16) ;
 CGSize video_size_ = CGSizeMake(width_16, height_);
jjpp
  • 1,298
  • 1
  • 15
  • 31
0

The image buffer you get seem to contain some padding at the end. E.g.

bytesPerRow = 4 * bufferWidth + 12;

This is often done so each pixel row starts at a 16 byte offset.

Mats
  • 8,528
  • 1
  • 29
  • 35
  • I have a 360x270 video which AVFoundation outputs with 1472 bytes stride (instead of 360*4=1440). Both 1440 and 1472 are divisible by 16 and even 32. Seems similar to AVCapture* implementation. – AndiDog Mar 26 '13 at 08:43