For splitting the video and audio, you may map the video output to stderr
pipe and map the audio output to stdout
pipe.
-----------
--->| Raw Video | ---> stderr (pipe)
----------- ------------- | -----------
| Input | | FFmpeg | |
| Video and | ---> | sub-process | ---
| Audio | | | |
----------- ------------- | -----------
--->| Raw Audio | ---> stdout (pipe)
-----------
For creating a simple demonstration of the concept, the example uses synthetic video and audio as input.
The following FFmpeg CLI command may be used as reference:
ffmpeg -y -f lavfi -i testsrc=size=192x108:rate=1:duration=10 -f lavfi -i sine=frequency=400:r=16384:duration=10 -vcodec rawvideo -pix_fmt rgb24 -map 0:v -f:v rawvideo vid.yuv -map 1:a -acodec pcm_s16le -ar 16384 -ac 1 -f:a s16le aud.pcm
The above command creates synthetic video and synthetic audio, and maps the raw video to vid.yuv
file, and maps the raw audio to aud.pcm
file.
For testing, execute the above command, and keep vid.yuv
and aud.pcm
as references.
Instead of mapping the output to files, we may map the output to stderr
and stdout
:
ffmpeg -hide_banner -loglevel error -f lavfi -i testsrc=size=192x108:rate=1:duration=10 -f lavfi -i sine=frequency=400:r=16384:duration=10 -vcodec rawvideo -pix_fmt rgb24 -map 0:v -f:v rawvideo pipe:2 -acodec pcm_s16le -ar 16384 -ac 1 -map 1:a -f:a s16le pipe:1 -report
Since we are using stderr
for the video output, we need to avoid any printing to stderr
, so we are adding -hide_banner -loglevel error
arguments.
The Python code sample, uses subprocess
module, instead of using ffmpeg-python
.
I just couldn't figure out how to apply the mapping with ffmpeg-python
module...
The Python code sample applies the following stages:
- Execute FFmpeg as sub-process using
sp.Popen
.
Apply stdout=sp.PIPE
, stderr=sp.PIPE
for "capturing" stdout
and stderr
output pipes.
- Start video reader thread.
The video reader thread reads raw video frames from stderr.
The thread writes the video to video.yuv
binary file for testing.
- Start audio reader thread.
The audio reader thread reads raw audio samples from stdout.
The thread writes the audio to audio.pcm
binary file for testing.
- Wait for the threads and the process to finish.
Here is a "stand alone" code sample:
import subprocess as sp
import shlex
import threading
# Reference command line (stores the video to vid.yuv and the audio to aud.pcm):
# sp.run(shlex.split('ffmpeg -y -f lavfi -i testsrc=size=192x108:rate=1:duration=10 -f lavfi -i sine=frequency=400:r=16384:duration=10 -vcodec rawvideo -pix_fmt rgb24 -map 0:v -f:v rawvideo vid.yuv -acodec pcm_s16le -ar 16384 -ac 1 -map 1:a -f:a s16le aud.pcm'))
# Video reader thread.
def video_reader(pipe):
f = open('video.yuv', 'wb') # For testing - store the raw video to video.yuv (binary file)
while True:
frame = pipe.read(192*108*3) # Read raw video frame
# Break the loop when length is too small
if len(frame) < 192*108*3:
break
f.write(frame)
f.close()
# Audio reader thread.
def audio_reader(pipe):
f = open('audio.pcm', 'wb') # For testing - store the raw audio to audio.pcm (binary file)
while True:
samples = pipe.read(4096) # Read raw audio packets (read 2048 samples in pcm_s16le format).
# Break the loop when length is too small
if len(samples) < 4096:
break
f.write(samples)
f.close()
# Execute FFmpeg as sub-process
# Map the video to stderr and map the audio to stdout
process = sp.Popen(shlex.split('ffmpeg -hide_banner -loglevel error ' # Set loglevel to error for disabling the prints ot stderr
'-f lavfi -i testsrc=size=192x108:rate=1:duration=10 ' # Synthetic video 192x108 at 1Hz (10 seconds)
'-f lavfi -i sine=frequency=400:r=16384:duration=10 ' # Synthetic audio mono, 16384 samples per second (10 seconds)
'-vcodec rawvideo -pix_fmt rgb24 ' # Raw video codec with rgb24 pixel format
'-map 0:v -f:v rawvideo pipe:2 ' # rawvideo format is mapped to stderr pipe
'-acodec pcm_s16le -ar 16384 -ac 1 ' # Audio codec pcm_s16le (-ar 16k has no affect)
'-map 1:a -f:a s16le pipe:1 ' # s16le audio format is mapped to stdout pipe
'-report'), # Create a log file (because we can't the statuses that are usually printed to stderr).
stdout=sp.PIPE, stderr=sp.PIPE)
# Start video reader thread (pass stderr pipe as argument).
video_thread = threading.Thread(target=video_reader, args=(process.stderr,))
video_thread.start()
# Start audio reader thread (pass stdout pipe as argument).
audio_thread = threading.Thread(target=audio_reader, args=(process.stdout,))
audio_thread.start()
# Wait for threads (and process) to finish.
video_thread.join()
audio_thread.join()
process.wait()
For validation, compare video.yuv
with vid.yuv
and audio.pcm
with aud.pcm
.
The files should be identical.
Notes:
The answer is just a "proof of concept" template.
For adding filters (like) flip, and for getting the input from in_url
there is still some work.
When having only one input, the mapping is: -map 0:v
and -map 0:a
.
The resolution of the input video should be known from advance.
In case the audio source is not mono audio with 16384 samples per second, you may need to use aresample filter (-ar 16k
may not work).