It's worth understanding off the top that these are bindings for FFMpeg, which is doing all the work. It's useful to understand the FFMpeg program itself, in particular the command-line arguments it takes. There is a lot there, but you can learn it a piece at a time according to your actual needs.
Your existing input stream:
process = (
ffmpeg.input("http://192.168.1.78:8080").output(
'-',
format='matroska',
acodec='libvorbis',
vcodec='libx264'
).run_async(pipe_stdout=True, pipe_stderr=True)
)
Let's compare that to the one in the example partway down the documentation, titled "Process video frame-by-frame using numpy:" (I reformatted it a little to match):
process1 = (
ffmpeg.input(in_filename).output(
'pipe:',
format='rawvideo',
pix_fmt='rgb24'
).run_async(pipe_stdout=True)
)
It does not matter whether we use a file or a URL for our input source - ffmpeg.input
figures that out for us, and at that point we just have an ffmpeg.Stream
either way. (Just like we could use either for a -i
argument for the command-line ffmpeg
program.)
The next step is to specify how the stream outputs (i.e., what kind of data we will get out when we read from the stdout
of the process. The documentation's example uses 'pipe:'
to specify writing to stdout; this should be the same as '-'
. The documentation's example does not pipe_stderr
, but that shouldn't matter since we do not plan to read from the stderr either way.
The key difference is that we specify a format that we know how to handle. 'rawvideo'
means exactly what it sounds like, and is suitable for reading the data into a Numpy array. (This is what we would pass as a -f
option at the command line.)
The pix_fmt
keyword parameter means what it sounds like: 24 bits per pixel, representing red, green and blue components. There are a bunch of pre-defined values for this which you can see with ffmpeg -pix_fmts
. And, yes, you would specify this as -pix_fmt
at the command line.
Having created such an input stream, we can read from its stdout and create Numpy arrays from each piece of data. We don't want to read data in arbitrary "packet" sizes for this; we want to read exactly as much data as is needed for one frame. That will be the width of the video, times the height, times three (for RGB components at 1 byte each). Which is exactly what we see later in the example:
while True:
in_bytes = process1.stdout.read(width * height * 3)
if not in_bytes:
break
in_frame = (
np
.frombuffer(in_bytes, np.uint8)
.reshape([height, width, 3])
)
Pretty straightforward: we iteratively read that amount of data, check for the end of the stream, and then create the frame with standard Numpy stuff.
Notice that at no point here did we attempt to separate audio and video - this is because a rawvideo
codec, as the name implies, won't output any audio data. We don't need to select the video from the input stream in order to filter it out. But we can - it's as simple as shown at the top of the documentation: ffmpeg.input(...).video.output(...)
. Similarly for audio.
We can process the audio by creating a separate stream. Choose an appropriate audio format, and specify any other needed arguments. So, perhaps something like:
process2 = (
ffmpeg.input(in_filename).output(
'pipe:',
format='s16le',
sample_rate='44100'
).run_async(pipe_stdout=True)
)