I have a couple of numpy image frames (np.uint8
, 4 in total) and want to create a mp4 file. So far so easy, it works fine with the mpv4 codec. But that codec compresses the files too much so that a lot of visual quality gets lost. So I tried a thousand of different codecs (h264, h265, av1, vp9 and others) and also other data formats (.mkv, .avi). I tried it with cv2.VideoWriter, skvideo.io.FFmpegWriter and ffmpeg-python. but no matter what I do, the video is either completely or only partially corrupted (approx the last quater of frames).
I use VLC media player to check the files. I don't understand what's going on here. I don't get any error messages. Strangely, sometimes, windows standard video player is able to read the video. So the frames are not always completely corrupted.
The skvideo code:
def to_lossless_mp4(img_list: np.array, target_path):
height, width, _ = img_list[0].shape
writer = skvideo.io.FFmpegWriter(target_path,
inputdict={
'-framerate': str(FPS),
'-f': 'rawvideo',
'-s': '{}x{}'.format(width, height)
},
outputdict={
'-vcodec': 'libx265', # h.264 codec
'-crf': '0',
'-preset': 'veryslow', # the slower the better compression
'-framerate': str(FPS),
})
for i, image in enumerate(img_list):
# some images are grayscale, since all frames need to be same size I do this:
if len(image.shape) < 3:
image = np.stack([image] * 3, axis=2)
image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
writer.writeFrame(image)
writer.close()
the cv2 code:
def to_mp4(img_list: np.array, target_path):
height, width, _ = img_list[0].shape
if OVERRIDE or not os.path.exists(target_path):
out = cv2.VideoWriter(target_path, cv2.VideoWriter_fourcc(*'x265'), FPS, (width, height))
for image in img_list:
if len(image.shape) < 3:
image = np.stack([image] * 3, axis=2)
image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
out.write(image)
out.release()
the ffmpeg-python code:
def to_mp4(fn, images, framerate=1, vcodec='libx264'):
if not isinstance(images, np.ndarray):
images = np.asarray(images)
n, height, width, channels = images.shape
process = (
ffmpeg
.input('pipe:', format='rawvideo', pix_fmt='rgb24', s='{}x{}'.format(width, height))
.output(fn, pix_fmt='yuv420p', vcodec=vcodec, r=framerate)
.run_async(pipe_stdin=True)
.overwrite_output()
)
for frame in images:
if len(frame.shape) < 3:
frame = np.stack([frame] * 3, axis=2)
process.write(
frame
.astype(np.uint8)
.tobytes()
)
process.stdin.close()
process.wait()
edit: this is the output when I use the code proposed by Rotem:
[00:00<?, ?it/s]ffmpeg version 2.7 Copyright (c) 2000-2015 the FFmpeg developers
built with gcc 4.9.2 (GCC)
configuration: --enable-gpl --enable-version3 --disable-w32threads --enable-avisynth --enable-bzlib --enable-fontconfig --enable-frei0r --enable-gnutls --enable-iconv --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libdcadec --enable-libfreetype --enable-libgme --enable-libgsm --enable-libilbc --enable-libmodplug --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-librtmp --enable-libschroedinger --enable-libsoxr --enable-libspeex --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvo-aacenc --enable-libvo-amrwbenc --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxavs --enable-libxvid --enable-lzma --enable-decklink --enable-zlib
libavutil 54. 27.100 / 54. 27.100
libavcodec 56. 41.100 / 56. 41.100
libavformat 56. 36.100 / 56. 36.100
libavdevice 56. 4.100 / 56. 4.100
libavfilter 5. 16.101 / 5. 16.101
libswscale 3. 1.101 / 3. 1.101
libswresample 1. 2.100 / 1. 2.100
libpostproc 53. 3.100 / 53. 3.100
[rawvideo @ 00000000051767c0] Stream #0: not enough frames to estimate rate; consider increasing probesize
Input #0, rawvideo, from 'pipe:':
Duration: N/A, start: 0.000000, bitrate: 75497 kb/s
Stream #0:0: Video: rawvideo (BGR[24] / 0x18524742), bgr24, 2048x1536, 75497 kb/s, 1 tbr, 1 tbn, 1 tbc
x265 [info]: HEVC encoder version 1.7
x265 [info]: build info [Windows][GCC 4.9.2][64 bit] 8bpp
x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX AVX2 FMA3 LZCNT BMI2
x265 [info]: Main profile, Level-5 (Main tier)
x265 [info]: Thread pool created using 16 threads
x265 [info]: frame threads / pool features : 5 / wpp(24 rows)
x265 [info]: Coding QT: max CU size, min CU size : 64 / 8
x265 [info]: Residual QT: max TU size, max depth : 32 / 1 inter / 1 intra
x265 [info]: ME / range / subpel / merge : hex / 57 / 2 / 2
x265 [info]: Keyframe min / max / scenecut : 1 / 250 / 40
x265 [info]: Lookahead / bframes / badapt : 20 / 4 / 2
x265 [info]: b-pyramid / weightp / weightb / refs: 1 / 1 / 0 / 3
x265 [info]: AQ: mode / str / qg-size / cu-tree : 1 / 1.0 / 64 / 1
x265 [info]: Rate Control / qCompress : CRF-24.0 / 0.60
x265 [info]: tools: rd=3 psy-rd=0.30 signhide tmvp strong-intra-smoothing
x265 [info]: tools: deblock sao
Output #0, mp4, to 'C:datamerged_datatestsubsubsub2022-10-12_11-03-44_3583968045500ns_io.mp4':
Metadata:
encoder : Lavf56.36.100
Stream #0:0: Video: hevc (libx265) ([35][0][0][0] / 0x0023), yuv420p, 2048x1536, q=2-31, 1 fps, 16384 tbn, 1 tbc
Metadata:
encoder : Lavc56.41.100 libx265
Stream mapping:
Stream #0:0 -> #0:0 (rawvideo (native) -> hevc (libx265))
frame= 4 fps=0.0 q=0.0 Lsize= 761kB time=00:00:02.00 bitrate=3117.4kbits/s
video:759kB audio:0kB subtitle:0kB other streams:0kB global headers:1kB muxing overhead: 0.236524%
x265 [info]: frame I: 2, Avg QP:13.55 kb/s: 2078.50
x265 [info]: frame P: 1, Avg QP:15.08 kb/s: 1560.14
x265 [info]: frame B: 1, Avg QP:18.28 kb/s: 502.67
x265 [info]: global : 4, Avg QP:15.11 kb/s: 1554.95
x265 [info]: Weighted P-Frames: Y:0.0% UV:0.0%
x265 [info]: consecutive B-frames: 66.7% 33.3% 0.0% 0.0% 0.0%