How to pass BGR NumPy arrays directly to FFMPEG with CUDA support

Question

I am using cv2 to edit images and create a video from the frames with FFMPEG. See this post for more details.

The images are 3D RGB NumPy arrays (shape is like [h, w, 3]), they are stored in a Python list.

Yep, I know cv2 has a VideoWriter and I have used it before, but it is very inadequate to meet my needs.

Simply put, it can only use an FFMPEG version that comes with it, that version does not support CUDA and uses up all CPU time when generating the videos while not using any GPU time at all, the output is way too big and I can't pass many FFMPEG parameters to the VideoWrite initiation.

I downloaded precompiled binaries of FFMPEG for Windows with CUDA support here, I am using Windows 10 21H1 x64, and my GPU is NVIDIA Geforce GTX 1050 Ti.

Anyways I need to mess with all the parameters found here and there to find the best compromise between quality and compression, like this:

command = '{} -y -stream_loop {} -framerate {} -hwaccel cuda -hwaccel_output_format cuda -i {}/{}_%d.png -c:v hevc_nvenc -preset 18 -tune 1 -rc vbr -cq {} -multipass 2 -b:v {} -vf scale={}:{} {}'
os.system(command.format(FFMPEG, loops-1, fps, tmp_folder, file_name, quality, bitrate, frame_width, frame_height, outfile))

I need to use exactly the binary I downloaded and specify as many parameters as I can to achieve the optimal result.

Currently I can only save the arrays to a disk as images and use the images as input of FFMPEG, and that is slow but I need exactly that binary and all those parameters.

After hours of Google searching I found ffmpeg-python, which seems perfect for the job, and I even found this : I can pass the binary path as an argument to the run function, this

import ffmpeg
import io


def vidwrite(fn, images, framerate=60, vcodec='libx264'):
    if not isinstance(images, np.ndarray):
        images = np.asarray(images)
    _,height,width,channels = images.shape
    process = (
        ffmpeg
            .input('pipe:', format='rawvideo', pix_fmt='rgb24', s='{}x{}'.format(width, height), r=framerate)
            .output(fn, pix_fmt='yuv420p', vcodec=vcodec, r=framerate)
            .overwrite_output()
            .run_async(pipe_stdin=True, overwrite_output=True, pipe_stderr=True)
    )
    for frame in images:
        try:
            process.stdin.write(
                frame.astype(np.uint8).tobytes()
            )
        except Exception as e: # should probably be an exception related to process.stdin.write
            for line in io.TextIOWrapper(process.stderr, encoding="utf-8"): # I didn't know how to get the stderr from the process, but this worked for me
                print(line) # <-- print all the lines in the processes stderr after it has errored
            process.stdin.close()
            process.wait()
            return # cant run anymore so end the for loop and the function execution

However I need to pass all those parameters and possibly many more to the process and I am not sure where these parameters should be passed to (where should stream_loop go? What about hwaccel, hwaccel_output_format, multipass...?).

How do I properly pipeline a bunch of NumPy arrays to an FFMPEG process spawned by an binary that supports CUDA and pass all sorts of arguments to the initialization of that process?

you need to be aware that `ffmpeg-python` simply starts a subprocess and moves your data using pipes, which is inefficient compared to actually calling a library function of ffmpeg. if you need efficiency, use `PyAV`. — Christoph Rackwitz, Jul 01 '22 at 09:34

score 2 · Accepted Answer · answered Jun 28 '22 at 21:35

In case you already know the syntax of FFmpeg CLI, you may use subprocess module as in my following answer (the syntax applies FFmpeg CLI).

When using ffmpeg-python package:

All the input arguments (before the -i) are in the input(...) part.
The filters (-vf or -filter_complex) are chained as filter(...).filter(...).filter(...)
The output arguments (after the input file name) are in the output(...) part.
When there are arguments with colons like 'b:v', we have to use dictionary notation like **{'b:v': '0'}.
.overwrite_output() is equivalent to -y.

Notes:

hevc_nvenc applies H.265 (HEVC) codec.
In case you prefer H.264 (AVC) codec, use h264_nvenc (may require different parameters).
The input pixel format should be 'bgr24' (not 'rgb24') because OpenCV uses BGR ordering.

Using CUDA accelerated scaling (resize):
The standard scale filter, uses CPU software scaling.
For GPU CUDA accelerated scaling we may use scale_cuda filter.
Before using scale_cuda, we have to upload the frame from the CPU memory to the GPU memory using hwupload_cuda filter.
We should also use the following arguments (at the beginning): vsync=0, hwaccel='cuda', hwaccel_output_format='cuda'.
See: Using FFmpeg with NVIDIA GPU Hardware Acceleration.

Here is a Python code sample that demonstrates h264_nvenc encoding and scale_cuda filter (writing numbered frames for testing):

import cv2
import numpy as np
import ffmpeg
   
width, height, n_frames, fps = 640, 480, 50, 25  # 50 frames, resolution 640x480, and 25 fps
out_width, out_height = 320, 240  # Downscale to 320x240 (for example).

output_filename = 'output.mp4'

# Set pix_fmt to bgr24, because OpenCV uses BGR ordering (not RGB).
# vcodec='hevc_nvenc' - Select hevc_nvenc codec for NVIDIA GPU accelerated H.265 (HEVC) video encoding.
# hwupload_cuda - upload the frame from CPU memory to GPU memory before using CUDA accelerated scaling filter.
# scale_cuda - Use CUDA (GPU accelerated) scaling filter
# Use dictionary notation due to arguments with colon.

# Execute FFmpeg sub-process using stdin pipe as input.
process = (
    ffmpeg
    .input('pipe:', vsync=0, hwaccel='cuda', hwaccel_output_format='cuda', format='rawvideo', pix_fmt='bgr24', s=f'{width}x{height}', r=f'{fps}')
    .filter('hwupload_cuda')  # https://docs.nvidia.com/video-technologies/video-codec-sdk/ffmpeg-with-nvidia-gpu/
    .filter('scale_cuda', w=out_width, h=out_height)  # CUDA accelerated scaling filter
    .filter('setsar', sar=1)  # Keep the aspect ratio
    .output(output_filename, vcodec='hevc_nvenc', **{'preset:v': '18', 'tune:v': '1', 'rc:v': 'vbr', 'cq:v': '19', 'b:v': '0'}, multipass=2)
    .overwrite_output()
    .run_async(pipe_stdin=True, overwrite_output=True)
)


# Build synthetic video frames and write them to ffmpeg input stream (for testing):
for i in range(n_frames):
    # Build synthetic image for testing ("render" a video frame).
    img = np.full((height, width, 3), 60, np.uint8)
    cv2.putText(img, str(i+1), (width//2-100*len(str(i+1)), height//2+100), cv2.FONT_HERSHEY_DUPLEX, 10, (255, 30, 30), 20)  # Blue number

    # Write raw video frame to input stream of ffmpeg sub-process.
    process.stdin.write(img.tobytes())

# Close and flush stdin
process.stdin.close()

# Wait for sub-process to finish
process.wait()

Note:
The above code was tested with NVIDIA GeForce GTX 1650, there is no guarantee that it's going to work with GTX 1050 Ti (due to hardware limitations).

How to pass BGR NumPy arrays directly to FFMPEG with CUDA support

1 Answers1