for semantic segmentation inference I'd like to extract frames of a video and run them through a network without using OpenCV's VideoCapture as something like (answered in this question):
vidcap = cv2.VideoCapture('testvideo.mp4')
the reason being that I don't want to install that package which is a bit heavy only to get frames from video.
What I currently do is getting the frames from a video with ffmpeg
with:
ffmpeg -i testvideo.mp4 frames/%05d.png
And then load the frames, perform inference, and save the results. I've seen packages like ffmpeg-python which have examples for how to load the video into numpy arrays, which I haven't managed to get working.