Machine learning algorithms for video processing typically work on frames (images) rather than video.
In my work, I use ffmpeg to dump a specific scene as a sequence of .png files, process them in some way (denoise, deblur, colorize, annotate, inpainting, etc), output the results into an equal number of .png files, and then update the original video with the new frames.
This works well with constant frame-rate (CFR) video. I dump the images as so (eg, 50-frame sequence starting at 1:47):
ffmpeg -i input.mp4 -vf "select='gte(t,107)*lt(selected_n,50)'" -vsync passthrough '107+%06d.png'
And then after editing the images, I replace the originals as so (for a 12.5fps CFR video):
ffmpeg -i input.mp4 -itsoffset 107 -framerate 25/2 -i '107+%06d.png' -filter_complex "[0]overlay=eof_action=pass" -vsync passthrough -c:a copy output.mp4
However, many of the videos I work with are variable frame-rate (VFR), and this has created some challenges.
A simple solution is to convert VFR video to CFR, which ffmpeg wants to do anyway, but I'm wondering if it's possible to avoid this. The reason is that CFR requires either dropping frames - since the purpose of ML video processing is usually to improve the output, I'd like to avoid this - or duplicating frames - but an upscaling algorithm that I'm working with right now uses the previous and next frame for data - if the previous or next frame is a duplicate, then ... no data for upscaling.
With -vsync passthrough
, I had hoped that I could simply remove the -framerate
option, and preserve the original frames as-is, but the resulting command:
ffmpeg -i input.mp4 -itsoffset 107 -i '107+%06d.png' -filter_complex "[0]overlay=eof_action=pass" -vsync passthrough -c:a copy output.mp4
uses ffmpeg's default of 25fps, and drops a lot of frames. Is there a reliable way to replace frames in VFR video?