I've managed to create an app that receives a live h264 encoded video stream and then decodes and displays the video with Video Toolbox and AVSampleBufferDisplayLayer. This works as expected but I want to be able to apply filters to the rendered output so I changed to decoding with Video Toolbox and displaying/rendering the decoded video with MetalKit. The only problem I have is that my rendered output with MetalKit is noticeably more blurry than the output received with AVSampleBufferDisplayLayer and I haven't managed to find out why.
Here's the output from AVSampleBufferDisplayLayer
Here's the output from MetalKit
I've tried skipping MetalKit and rendering directly to a CAMetalLayer but the same issue persists. I'm in the middle of trying to convert my CVImageBufferRef to an UIImage that I can display with UIView's. If this also ends up blurry then maybe the issue is with my VTDecompressionSession and not with the Metal side of things.
The decoding part is pretty much like what's given here How to use VideoToolbox to decompress H.264 video stream
I'll try to just paste the interesting snippets of my code.
These are the options I give my VTDecompressionSession.
NSDictionary *destinationImageBufferAttributes = [NSDictionary dictionaryWithObjectsAndKeys:
[NSNumber numberWithInteger:kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange],
(id)kCVPixelBufferPixelFormatTypeKey,
nil];
This is my view that inherits from MTKView
@interface StreamView : MTKView
@property id<MTLCommandQueue> commandQueue;
@property id<MTLBuffer> vertexBuffer;
@property id<MTLBuffer> colorConversionBuffer;
@property id<MTLRenderPipelineState> pipeline;
@property CVMetalTextureCacheRef textureCache;
@property CFMutableArrayRef imageBuffers;
-(id)initWithRect:(CGRect)rect withDelay:(int)delayInFrames;
-(void)addToRenderQueue:(CVPixelBufferRef)image renderAt:(int)frame;
@end
This is how I initialize the view from my view controller. The video I receive is of the same size, that is 666x374.
streamView = [[StreamView alloc] initWithRect:CGRectMake(0, 0, 666, 374) withDelay:0];
[self.view addSubview:streamView];
This is the content of the StreamView's initWithRect method
id<MTLDevice> device = MTLCreateSystemDefaultDevice();
self = [super initWithFrame:rect device:device];
self.colorPixelFormat = MTLPixelFormatBGRA8Unorm;
self.commandQueue = [self.device newCommandQueue];
[self buildTextureCache];
[self buildPipeline];
[self buildVertexBuffers];
This is the buildPipeline method
- (void)buildPipeline
{
NSBundle *bundle = [NSBundle bundleForClass:[self class]];
id<MTLLibrary> library = [self.device newDefaultLibraryWithBundle:bundle error:NULL];
id<MTLFunction> vertexFunc = [library newFunctionWithName:@"vertex_main"];
id<MTLFunction> fragmentFunc = [library newFunctionWithName:@"fragment_main"];
MTLRenderPipelineDescriptor *pipelineDescriptor = [MTLRenderPipelineDescriptor new];
pipelineDescriptor.vertexFunction = vertexFunc;
pipelineDescriptor.fragmentFunction = fragmentFunc;
pipelineDescriptor.colorAttachments[0].pixelFormat = self.colorPixelFormat;
self.pipeline = [self.device newRenderPipelineStateWithDescriptor:pipelineDescriptor error:NULL];
}
Here is how I actually draw my texture
CVImageBufferRef image = (CVImageBufferRef)CFArrayGetValueAtIndex(_imageBuffers, 0);
id<MTLTexture> textureY = [self getTexture:image pixelFormat:MTLPixelFormatR8Unorm planeIndex:0];
id<MTLTexture> textureCbCr = [self getTexture:image pixelFormat:MTLPixelFormatRG8Unorm planeIndex:1];
if(textureY == NULL || textureCbCr == NULL)
return;
id<CAMetalDrawable> drawable = self.currentDrawable;
id<MTLCommandBuffer> commandBuffer = [_commandQueue commandBuffer];
MTLRenderPassDescriptor *renderPass = self.currentRenderPassDescriptor;
renderPass.colorAttachments[0].clearColor = MTLClearColorMake(0.5, 1, 0.5, 1);
id<MTLRenderCommandEncoder> commandEncoder = [commandBuffer renderCommandEncoderWithDescriptor:renderPass];
[commandEncoder setRenderPipelineState:self.pipeline];
[commandEncoder setVertexBuffer:self.vertexBuffer offset:0 atIndex:0];
[commandEncoder setFragmentTexture:textureY atIndex:0];
[commandEncoder setFragmentTexture:textureCbCr atIndex:1];
[commandEncoder setFragmentBuffer:_colorConversionBuffer offset:0 atIndex:0];
[commandEncoder drawPrimitives:MTLPrimitiveTypeTriangleStrip vertexStart:0 vertexCount:4 instanceCount:1];
[commandEncoder endEncoding];
[commandBuffer presentDrawable:drawable];
[commandBuffer commit];
This is how I convert a CVPixelBufferRef into an MTLTexture
- (id<MTLTexture>)getTexture:(CVPixelBufferRef)image pixelFormat:(MTLPixelFormat)pixelFormat planeIndex:(int)planeIndex {
id<MTLTexture> texture;
size_t width, height;
if (planeIndex == -1)
{
width = CVPixelBufferGetWidth(image);
height = CVPixelBufferGetHeight(image);
planeIndex = 0;
}
else
{
width = CVPixelBufferGetWidthOfPlane(image, planeIndex);
height = CVPixelBufferGetHeightOfPlane(image, planeIndex);
NSLog(@"texture %d, %ld, %ld", planeIndex, width, height);
}
CVMetalTextureRef textureRef = NULL;
CVReturn status = CVMetalTextureCacheCreateTextureFromImage(NULL, _textureCache, image, NULL, pixelFormat, width, height, planeIndex, &textureRef);
if(status == kCVReturnSuccess)
{
texture = CVMetalTextureGetTexture(textureRef);
CFRelease(textureRef);
}
else
{
NSLog(@"CVMetalTextureCacheCreateTextureFromImage failed with return stats %d", status);
return NULL;
}
return texture;
}
This is my fragment shader
fragment float4 fragment_main(Varyings in [[ stage_in ]],
texture2d<float, access::sample> textureY [[ texture(0) ]],
texture2d<float, access::sample> textureCbCr [[ texture(1) ]],
constant ColorConversion &colorConversion [[ buffer(0) ]])
{
constexpr sampler s(address::clamp_to_edge, filter::linear);
float3 ycbcr = float3(textureY.sample(s, in.texcoord).r, textureCbCr.sample(s, in.texcoord).rg);
float3 rgb = colorConversion.matrix * (ycbcr + colorConversion.offset);
return float4(rgb, 1.0);
}
Because the view and the video I encode are both 666x374 I tried changing the sampling type in the fragment shader to filter::nearest. I thought it would match the pixels 1:1 but it was still as blurry. Another weird thing I noticed is that if you open the uploaded images in a new tab you'll see that they are way larger than 666x374... I doubt that I'm making a mistake on the encoding side and even if I did then AVSampleBufferDisplayLayer still manages to display the video without blur so they must be doing something right that I'm missing.