separate video from audio from ffmpeg stream

Question

I already created a similar thread How to extract video and audio from ffmpeg stream in python , but within the framework of the consultations received, I could not implement what I asked about. I was not able to reopen the question and maybe formulate it in more detail. So I'm asking the question here again.

I'm trying to implement a data parsing process (video and audio) from an ffmpeg stream for each frame. Ideally, I would like to get 1 numpy array per data type per unit of time. I studied an example with getting a numpy array from audio and video separately (when either one or the second is present in the stream) and everything works successfully. Misunderstanding appears when there is both video and audio in the stream. It is not clear how many bytes to read information, and most importantly, what is in these bytes of audio, or is it video?

input_stream = ffmpeg.input(in_url)
video = input_stream.video.vflip()
out_stream = ffmpeg.output(
    input_stream.audio,
    video,
    filename='pipe:',
    format='rawvideo',
    pix_fmt='rgb24',
    acodec='pcm_s16le',
    ac=1,
    ar='16k'
)
process = out_stream.run_async(pipe_stdout=True)
while True:
    in_bytes = process.stdout.read(4096)
    # what next ?

Thanks for help.

Sorry to hear my `ffmpegio` solution didn't work for you (I've added a full example to my answer in case you missed it). AFAIK, there is no way you can tell apart video and audio samples in `rawvideo` format. As the name suggests, it is intended to carry only video stream. Knowing the video frames size and `rgb24`, video frame is always height*width*3 bytes but audio frame likely is variable length to be synced to video feed. For example, FFmpeg creates 1024 sample audio frames in AVI container but how many audio frames between video frames is variable — kesh, Feb 09 '22 at 00:51
I tested with different formats and at the moment I have achieved that after the script from the source there are two streams - sound and video. About the calculation of the size of bytes for the video principle is understandable. As well as for audio, there are a lot of examples of how to process, but I can’t find information anywhere on the Internet on how to process at a time. — user5285766, Feb 09 '22 at 21:32
I tried to stupidly count under audio and under video (summarize) and this is considered the size of which I read in bytes. Then, from the resulting piece, cut off a piece for audio and process what is left as a video, and vice versa. did not work) — user5285766, Feb 09 '22 at 21:34
@kesh I found a way to split the video and the audio. You may find may answer interesting. — Rotem, Feb 09 '22 at 22:32
@Rotem - haha you beat me to my comment on yours. yes indeed — kesh, Feb 09 '22 at 22:34

Rotem · Answer 1 · 2022-02-10T06:55:07.893

For splitting the video and audio, you may map the video output to stderr pipe and map the audio output to stdout pipe.

                                            -----------
                                       --->| Raw Video | ---> stderr (pipe)
 -----------        -------------     |     -----------    
| Input     |      | FFmpeg      |    |
| Video and | ---> | sub-process | ---      
| Audio     |      |             |    |    
 -----------        -------------     |     -----------
                                       --->| Raw Audio | ---> stdout (pipe)
                                            -----------

For creating a simple demonstration of the concept, the example uses synthetic video and audio as input.

The following FFmpeg CLI command may be used as reference:

ffmpeg -y -f lavfi -i testsrc=size=192x108:rate=1:duration=10 -f lavfi -i sine=frequency=400:r=16384:duration=10 -vcodec rawvideo -pix_fmt rgb24 -map 0:v -f:v rawvideo vid.yuv -map 1:a -acodec pcm_s16le -ar 16384 -ac 1 -f:a s16le aud.pcm

The above command creates synthetic video and synthetic audio, and maps the raw video to vid.yuv file, and maps the raw audio to aud.pcm file.
For testing, execute the above command, and keep vid.yuv and aud.pcm as references.

Instead of mapping the output to files, we may map the output to stderr and stdout:

ffmpeg -hide_banner -loglevel error -f lavfi -i testsrc=size=192x108:rate=1:duration=10 -f lavfi -i sine=frequency=400:r=16384:duration=10 -vcodec rawvideo -pix_fmt rgb24 -map 0:v -f:v rawvideo pipe:2 -acodec pcm_s16le -ar 16384 -ac 1 -map 1:a -f:a s16le pipe:1 -report

Since we are using stderr for the video output, we need to avoid any printing to stderr, so we are adding -hide_banner -loglevel error arguments.

The Python code sample, uses subprocess module, instead of using ffmpeg-python.
I just couldn't figure out how to apply the mapping with ffmpeg-python module...

The Python code sample applies the following stages:

Execute FFmpeg as sub-process using sp.Popen.
Apply stdout=sp.PIPE, stderr=sp.PIPE for "capturing" stdout and stderr output pipes.
Start video reader thread.
The video reader thread reads raw video frames from stderr.
The thread writes the video to video.yuv binary file for testing.
Start audio reader thread.
The audio reader thread reads raw audio samples from stdout.
The thread writes the audio to audio.pcm binary file for testing.
Wait for the threads and the process to finish.

Here is a "stand alone" code sample:

import subprocess as sp
import shlex
import threading

# Reference command line (stores the video to vid.yuv and the audio to aud.pcm):
# sp.run(shlex.split('ffmpeg -y -f lavfi -i testsrc=size=192x108:rate=1:duration=10 -f lavfi -i sine=frequency=400:r=16384:duration=10 -vcodec rawvideo -pix_fmt rgb24 -map 0:v -f:v rawvideo vid.yuv -acodec pcm_s16le -ar 16384 -ac 1 -map 1:a -f:a s16le aud.pcm'))

# Video reader thread.
def video_reader(pipe):
    f = open('video.yuv', 'wb')  # For testing - store the raw video to video.yuv (binary file)

    while True:
        frame = pipe.read(192*108*3)  # Read raw video frame

        # Break the loop when length is too small
        if len(frame) < 192*108*3:
            break

        f.write(frame)

    f.close()


# Audio reader thread.
def audio_reader(pipe):
    f = open('audio.pcm', 'wb')  # For testing - store the raw audio to audio.pcm (binary file)

    while True:
        samples = pipe.read(4096)  # Read raw audio packets (read 2048 samples in pcm_s16le format).

        # Break the loop when length is too small
        if len(samples) < 4096:
            break

        f.write(samples)

    f.close()



# Execute FFmpeg as sub-process
# Map the video to stderr and map the audio to stdout
process = sp.Popen(shlex.split('ffmpeg -hide_banner -loglevel error '                  # Set loglevel to error for disabling the prints ot stderr
                               '-f lavfi -i testsrc=size=192x108:rate=1:duration=10 '  # Synthetic video 192x108 at 1Hz (10 seconds)
                               '-f lavfi -i sine=frequency=400:r=16384:duration=10 '   # Synthetic audio mono, 16384 samples per second (10 seconds)
                               '-vcodec rawvideo -pix_fmt rgb24 '                      # Raw video codec with rgb24 pixel format                               
                               '-map 0:v -f:v rawvideo pipe:2 '                        # rawvideo format is mapped to stderr pipe
                               '-acodec pcm_s16le -ar 16384 -ac 1 '                    # Audio codec pcm_s16le (-ar 16k has no affect)
                               '-map 1:a -f:a s16le pipe:1 '                           # s16le audio format is mapped to stdout pipe
                               '-report'),                                             # Create a log file (because we can't the statuses that are usually printed to stderr).
                                stdout=sp.PIPE, stderr=sp.PIPE)


# Start video reader thread (pass stderr pipe as argument).
video_thread = threading.Thread(target=video_reader, args=(process.stderr,))
video_thread.start()

# Start audio reader thread (pass stdout pipe as argument).
audio_thread = threading.Thread(target=audio_reader, args=(process.stdout,))
audio_thread.start()


# Wait for threads (and process) to finish.
video_thread.join()
audio_thread.join()
process.wait()

For validation, compare video.yuv with vid.yuv and audio.pcm with aud.pcm.
The files should be identical.

Notes:

The answer is just a "proof of concept" template.
For adding filters (like) flip, and for getting the input from in_url there is still some work.
When having only one input, the mapping is: -map 0:v and -map 0:a.
The resolution of the input video should be known from advance.
In case the audio source is not mono audio with 16384 samples per second, you may need to use aresample filter (-ar 16k may not work).

Wow, pretty cool idea. Didn't think of using `stderr` for the 2nd output — kesh, Feb 09 '22 at 22:33
I think you meant to put `'-acodec -pcm_s16le -ar 16k -ac 1 '` line after `'-map 0:v -f:v rawvideo pipe:2 '` — kesh, Feb 09 '22 at 23:23
You are right, thanks. That was the reason `-ar 16k` had no affect. — Rotem, Feb 10 '22 at 06:56
Of course, the option of reading from two places came to my mind, but I did not implement it, because it seemed to me difficult from the very beginning in terms of synchronizing audio and video frames in time. This is important for my purpose. Of course, thank you very much for your reply. At the time of receipt, this is probably the only option that suits me. — user5285766, Feb 10 '22 at 10:38
@user5285766 - "difficult [...] synchronizing audio and video frames" All you need to do is to count samples and frames. You have the sampling and frame rates already in your hands. Then, use `queue.Queue` to pass the known-duration blocks of data b/w threads. — kesh, Feb 10 '22 at 16:33
In the thread, the guys mentioned pass_fds, but I was unable to wrap the second thread. If someone can help me with an example, I'd be grateful. In total, the problem of parallelization and processing of video and audio from the stream is still not solved. In general, it is strange that so much effort is required to solve such a simple typical task, maybe I'm stupid) — user5285766, Feb 24 '22 at 01:49
Unfortunately, I did not manage to take full advantage of your help. At the moment, the data correctly reaches only through stdout. When I stream through stderr, the bytes seem to even reach the thread, but after saving, it is not possible to view or listen to video or audio (depending on what was sent to stderr). I tried to do it by analogy as you did, but the problems started on step of generating synthetic video and audio. Everything is fine with the audio from your example and I can listen to this file using ffplay, but unfortunately I can’t view the video ( YUViewer) @Rotem — user5285766, Feb 24 '22 at 01:50
Try: `ffplay -framerate 1 -f rawvideo -video_size 192x108 -pixel_format rgb24 video.yuv`. In my machine the video is perfect. The way I verified the solution is documented in the answer. — Rotem, Feb 24 '22 at 06:39

separate video from audio from ffmpeg stream

1 Answers1

Linked