2

I've managed to create an app that receives a live h264 encoded video stream and then decodes and displays the video with Video Toolbox and AVSampleBufferDisplayLayer. This works as expected but I want to be able to apply filters to the rendered output so I changed to decoding with Video Toolbox and displaying/rendering the decoded video with MetalKit. The only problem I have is that my rendered output with MetalKit is noticeably more blurry than the output received with AVSampleBufferDisplayLayer and I haven't managed to find out why.

Here's the output from AVSampleBufferDisplayLayerAVSampleBufferDisplayLayer

Here's the output from MetalKitMetalKit

I've tried skipping MetalKit and rendering directly to a CAMetalLayer but the same issue persists. I'm in the middle of trying to convert my CVImageBufferRef to an UIImage that I can display with UIView's. If this also ends up blurry then maybe the issue is with my VTDecompressionSession and not with the Metal side of things.

The decoding part is pretty much like what's given here How to use VideoToolbox to decompress H.264 video stream

I'll try to just paste the interesting snippets of my code.

These are the options I give my VTDecompressionSession.

NSDictionary *destinationImageBufferAttributes = [NSDictionary dictionaryWithObjectsAndKeys:
                                                      [NSNumber numberWithInteger:kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange],
                                                      (id)kCVPixelBufferPixelFormatTypeKey,
                                                      nil];

This is my view that inherits from MTKView

@interface StreamView : MTKView

@property id<MTLCommandQueue> commandQueue;
@property id<MTLBuffer> vertexBuffer;
@property id<MTLBuffer> colorConversionBuffer;
@property id<MTLRenderPipelineState> pipeline;
@property CVMetalTextureCacheRef textureCache;

@property CFMutableArrayRef imageBuffers;

-(id)initWithRect:(CGRect)rect withDelay:(int)delayInFrames;
-(void)addToRenderQueue:(CVPixelBufferRef)image renderAt:(int)frame;

@end

This is how I initialize the view from my view controller. The video I receive is of the same size, that is 666x374.

streamView = [[StreamView alloc] initWithRect:CGRectMake(0, 0, 666, 374) withDelay:0];
[self.view addSubview:streamView];

This is the content of the StreamView's initWithRect method

id<MTLDevice> device = MTLCreateSystemDefaultDevice();
self = [super initWithFrame:rect device:device];

self.colorPixelFormat = MTLPixelFormatBGRA8Unorm;
self.commandQueue = [self.device newCommandQueue];
[self buildTextureCache];
[self buildPipeline];
[self buildVertexBuffers];

This is the buildPipeline method

- (void)buildPipeline
{
    NSBundle *bundle = [NSBundle bundleForClass:[self class]];
    id<MTLLibrary> library = [self.device newDefaultLibraryWithBundle:bundle error:NULL];

    id<MTLFunction> vertexFunc = [library newFunctionWithName:@"vertex_main"];
    id<MTLFunction> fragmentFunc = [library newFunctionWithName:@"fragment_main"];

    MTLRenderPipelineDescriptor *pipelineDescriptor = [MTLRenderPipelineDescriptor new];
    pipelineDescriptor.vertexFunction = vertexFunc;
    pipelineDescriptor.fragmentFunction = fragmentFunc;
    pipelineDescriptor.colorAttachments[0].pixelFormat = self.colorPixelFormat;

    self.pipeline = [self.device newRenderPipelineStateWithDescriptor:pipelineDescriptor error:NULL];
}

Here is how I actually draw my texture

CVImageBufferRef image = (CVImageBufferRef)CFArrayGetValueAtIndex(_imageBuffers, 0);

id<MTLTexture> textureY = [self getTexture:image pixelFormat:MTLPixelFormatR8Unorm planeIndex:0];
id<MTLTexture> textureCbCr = [self getTexture:image pixelFormat:MTLPixelFormatRG8Unorm planeIndex:1];
if(textureY == NULL || textureCbCr == NULL)
   return;

id<CAMetalDrawable> drawable = self.currentDrawable;

id<MTLCommandBuffer> commandBuffer = [_commandQueue commandBuffer];
MTLRenderPassDescriptor *renderPass = self.currentRenderPassDescriptor;
renderPass.colorAttachments[0].clearColor = MTLClearColorMake(0.5, 1, 0.5, 1);

id<MTLRenderCommandEncoder> commandEncoder = [commandBuffer renderCommandEncoderWithDescriptor:renderPass];
[commandEncoder setRenderPipelineState:self.pipeline];
[commandEncoder setVertexBuffer:self.vertexBuffer offset:0 atIndex:0];
[commandEncoder setFragmentTexture:textureY atIndex:0];
[commandEncoder setFragmentTexture:textureCbCr atIndex:1];
[commandEncoder setFragmentBuffer:_colorConversionBuffer offset:0 atIndex:0];
[commandEncoder drawPrimitives:MTLPrimitiveTypeTriangleStrip vertexStart:0 vertexCount:4 instanceCount:1];
[commandEncoder endEncoding];

[commandBuffer presentDrawable:drawable];
[commandBuffer commit];

This is how I convert a CVPixelBufferRef into an MTLTexture

- (id<MTLTexture>)getTexture:(CVPixelBufferRef)image pixelFormat:(MTLPixelFormat)pixelFormat planeIndex:(int)planeIndex {
    id<MTLTexture> texture;
    size_t width, height;

    if (planeIndex == -1)
    {
        width = CVPixelBufferGetWidth(image);
        height = CVPixelBufferGetHeight(image);
        planeIndex = 0;
    }
    else
    {
        width = CVPixelBufferGetWidthOfPlane(image, planeIndex);
        height = CVPixelBufferGetHeightOfPlane(image, planeIndex);
        NSLog(@"texture %d, %ld, %ld", planeIndex, width, height);
    }

    CVMetalTextureRef textureRef = NULL;
    CVReturn status = CVMetalTextureCacheCreateTextureFromImage(NULL, _textureCache, image, NULL, pixelFormat, width, height, planeIndex, &textureRef);
    if(status == kCVReturnSuccess)
    {
        texture = CVMetalTextureGetTexture(textureRef);
        CFRelease(textureRef);
    }
    else
    {
        NSLog(@"CVMetalTextureCacheCreateTextureFromImage failed with return stats %d", status);
        return NULL;
    }

    return texture;
}

This is my fragment shader

fragment float4 fragment_main(Varyings in [[ stage_in ]],
                              texture2d<float, access::sample> textureY [[ texture(0) ]],
                              texture2d<float, access::sample> textureCbCr [[ texture(1) ]],
                              constant ColorConversion &colorConversion [[ buffer(0) ]])
{
    constexpr sampler s(address::clamp_to_edge, filter::linear);
    float3 ycbcr = float3(textureY.sample(s, in.texcoord).r, textureCbCr.sample(s, in.texcoord).rg);

    float3 rgb = colorConversion.matrix * (ycbcr + colorConversion.offset);

    return float4(rgb, 1.0);
}

Because the view and the video I encode are both 666x374 I tried changing the sampling type in the fragment shader to filter::nearest. I thought it would match the pixels 1:1 but it was still as blurry. Another weird thing I noticed is that if you open the uploaded images in a new tab you'll see that they are way larger than 666x374... I doubt that I'm making a mistake on the encoding side and even if I did then AVSampleBufferDisplayLayer still manages to display the video without blur so they must be doing something right that I'm missing.

AMVaddictionist
  • 116
  • 1
  • 13
  • 1
    Are you rendering to a MTKView that is exactly 666x374? To properly render YCbCr you need to render into an intermediate texture to hold linear values and then render that linear set of values into the resize MTKView. Otherwise, you will be doing linear interpolation on non-linear values during the rescale. Also, it looks like you are not adjusting for gamma encoding when converting to RGB. – MoDJ Apr 06 '19 at 23:46
  • 1
    See this SO question: https://stackoverflow.com/questions/53911662/does-h-264-encoded-video-with-bt-709-matrix-include-any-gamma-adjustment – MoDJ Apr 06 '19 at 23:47
  • 1
    To be honest, both the scaling and gamma issues are very difficult to deal with, to simplify your code, I suggest that you configure the AVPlayerItemVideoOutput object to return RGB values as opposed to YCBCr. Otherwise, use an existing library implementation that has already solved these problems as opposed to trying to do it yourself. – MoDJ Apr 06 '19 at 23:50
  • Telling `VTDecompressionSessionCreate` to return an image buffer with the pixel format of `kCVPixelFormatType_32BGRA` and rendering that is giving me identical results to the blurry output of YCbCr. I don't think the issue lies within my conversion from YCbCr to RGB, unless VideoToolbox Decompression is also incorrect. – AMVaddictionist Apr 08 '19 at 07:35
  • 1
    You are getting BGRA pixels and then doing linear interpolation on the pixels as a sRGB texture? If you are doing that then the results should not be identical since the approach you posted above is used filter::linear directly on the non-linear YCbCr values. One way you can debug this is to capture the BGRA texture before the scaling operation, to make extra sure that the results are not blurry or weird before the scale to view dimensions operation. – MoDJ Apr 08 '19 at 08:00
  • 2
    This may be really obvious, but points on the view are typically 2x larger than the width and height of the video, so you might be rendering into a view that is not actually the dimensions of the video. I see that you put CGRectMake(0, 0, 666, 374) in your code but on a 2x screen this will not exactly match a video that is 666x374 pixels. – MoDJ Apr 08 '19 at 08:05
  • The comment about the view size was not obvious for me, that explains why my screenshots are larger than my CGRect size. Thanks. – AMVaddictionist Apr 08 '19 at 08:23
  • "so you might be rendering into a view that is not actually the dimensions of the video" You were right on the money with this line. I sent a video that was double the size of my CGRectMake and it's not blurry at all, but I would like to be able to have it working well with arbitrary video dimensions like AVSampleBufferDisplayLayer. Are you saying that if I convert my texture to linear space then the scaling will be well done and not blurry? I'm not sure if I have understood your comments correctly, I've looked up a ton of stuff when making this app but I'm still new to these things. – AMVaddictionist Apr 08 '19 at 09:13

1 Answers1

1

It looks like you have the most serious issue of view scale addressed, the other issues are proper YCbCr rendering (which it sounds like you are going to avoid by outputting BGRA pixels when decoding) and then there is scaling the original movie to match the dimensions of the view. When you request BGRA pixel data the data is encoded as sRGB, so you should treat the data in the texture as sRGB. Metal will automatically do the non-linear to linear conversion for you when reading from a sRGB texture, but you have to tell Metal that it is sRGB pixel data (using MTLPixelFormatBGRA8Unorm_sRGB). To implement scaling, you just need to render from the BGRA data into the view with linear resampling. See the SO question I linked above if you want to have a look at the source code for MetalBT709Decoder which is my own project that implements proper rendering of BT.709.

MoDJ
  • 4,309
  • 2
  • 30
  • 65
  • I don't want to be annoying considering all the help you're giving me but changing to MTLPixelFormatBGRA8Unorm_sRGB and using linear sampling is not making it less blurry. I don't think I'm doing anything really _wrong_ per say. After learning about the difference between points and pixels it's not so weird that my texture get blurry because I double its size when rendering (1334x750 display). But I don't understand how AVSampleBufferDisplayLayer manages to make it look so good. Doubling a screenshot of my image in a photo-editing software gives worse results than AVSampleBufferDisplayLayer. – AMVaddictionist Apr 09 '19 at 13:03
  • Linear interpolation doesn't seem that sophisticated so I don't think I can expect better results from it. Or should the up sampling give better results and I'm doing something wrong after all? I mostly skimmed through the rendering part of your project but I haven't found anything that can help me yet, maybe I will after looking through it some more. – AMVaddictionist Apr 09 '19 at 13:13
  • I was under the impression that you addressed the blurry problem by adjusting the view size in points to match a multiple of the pixel buffer dimensions in pixels? The pixel representation as sRGB is an additional issue after you have the scale issue fixed. Telling Metal that the pixels are sRGB means that the values are encoded with sRGB gamma, as opposed to straight linear values. You can see the diff with a high contrast image. – MoDJ Apr 09 '19 at 21:45
  • I was under the impression that I had not fixed it, but double checking now I apparently had? One of my tests must have been wrong somewhere... Thanks for the help and for being patient with me! – AMVaddictionist Apr 10 '19 at 07:50
  • You are attempting to solve one of the most difficult problems on iOS, I just happened to be working in this same area and started a month or two ahead of you, I am working on a new library that provides the same functionality but also adds alpha channel support and optimized YCbCr decoding. – MoDJ Apr 10 '19 at 08:26