I am trying to take advantage of the video encoding capabilities of my Nvidia GPU by using it, instead of CPUs, to save a stream of numpy arrays to an .mp4 or .avi file. With this I indend to:
- release work from my CPUs, so they can do other stuff at the same time
- potentially, speed up the encoding
In order to do that, I have created a sample repository that implements this functionality. The ffmpeg call that uses CUDA looks as follows:
ffmpeg -y -f rawvideo -pix_fmt rgb24 -vsync 0 -extra_hw_frames 2 -s 2000x2000 -r 45 -i - -an -c:v h264_nvenc output.mp4
As you see, ffmpeg receives data from standard input. This standard input is provided by the ffmpeg_gpu_benchmark.py
script.
However, even when running the CUDA flag -c:v h264_nvenc
, I observe ffmpeg and python still take up a lot of CPU time. Why could that be?
For the sake of completeness, here is the relevant code
def run_v2(camera, proc, args, pb, data=None):
try:
if data is None:
frame = read_frame(camera=camera, color=args.color, height=args.height, width=args.width)
data = frame
write_to_ffmpeg(proc, data)
if pb: pb.update(1)
if args.preview:
cv2.imshow("frame", cv2.resize(frame, (300, 300)))
if cv2.waitKey(1) == ord("q"):
raise QuitException
return 0
except (KeyboardInterrupt, QuitException):
stop_camera(camera)
return 1
where proc
is a subprocess.Popen
is created like so
proc = subprocess.Popen(
cmd,
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
shell=False
)
and write_to_ffmpeg
is just running this
# data is an np.array with dtype np.uint8
proc.stdin.write(data)
This is True even when I set frame
to be a constant random frame created with np.random.randint
when the program starts. So it's not latency due to frame acquiring.
PS I am doing this because unfortunately the CUDA based VideoWriter class from OpenCV is only supported on Windows and not Linux