4

I have two .mp4 files, both having 8 (7.1) audio channels. But in fact, I've been told that one has a stereo audio channel + 2 SAP (secondary audio on channels 7-8), and the other one has 6 (5.1) audio channels + 2 SAP (on channels 7-8). So basically the later one has some [real] audio channels such as Center channel where that doesn't exist in the former stereo one (although it has those channels, but apparently they are silent/mute).

I've been trying to see some differentiating metadata to somehow differentiate the two using Mediainfo, but the metadata for both look exactly the same. Also tried some basic metadata retrieval with ffmpeg and ffprobe, again they both look the same - no luck:

ffprobe -i 2ch.mp4 -show_streams -select_streams a:0

So the question is: Does ffmpeg or ffprobe have any quick ways to differentiate those two? Are there any audio filters that can detect if a specific audio channel is silent or not? Or any other differentiating metadata? I would prefer differentiating the two through some metadata than content analysis.

This is a sample of 2-channel mp4 file, and this one is a sample of the 6-channel mp4.

Tina J
  • 4,983
  • 13
  • 59
  • 125

1 Answers1

4

Both of your sample files have 4 audio streams or tracks. Each audio track has 2 channels, with a layout of stereo.

Apparently, the audio encoder is constant bit-rate, and so the metadata cannot be used to distinguish silent tracks from sound-bearing ones.

You'll need to parse each suspect audio stream.

ffmpeg -i file -map 0:a:1 -af astats -f null -

At the end of the console log, statistics for the audio stream will be printed,

e.g.

[Parsed_astats_0 @ 0000000003c3aec0] Channel: 1
[Parsed_astats_0 @ 0000000003c3aec0] DC offset: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] Min level: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] Max level: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] Min difference: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] Max difference: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] Mean difference: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] RMS difference: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] Peak level dB: -6153.053111
[Parsed_astats_0 @ 0000000003c3aec0] RMS level dB: -inf
[Parsed_astats_0 @ 0000000003c3aec0] RMS peak dB: -3076.526556
[Parsed_astats_0 @ 0000000003c3aec0] RMS trough dB: -inf
[Parsed_astats_0 @ 0000000003c3aec0] Crest factor: 1.000000
[Parsed_astats_0 @ 0000000003c3aec0] Flat factor: -inf
[Parsed_astats_0 @ 0000000003c3aec0] Peak count: 662528
[Parsed_astats_0 @ 0000000003c3aec0] Bit depth: 0/0
[Parsed_astats_0 @ 0000000003c3aec0] Dynamic range: -inf
[Parsed_astats_0 @ 0000000003c3aec0] Zero crossings: 0
[Parsed_astats_0 @ 0000000003c3aec0] Zero crossings rate: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] Channel: 2
[Parsed_astats_0 @ 0000000003c3aec0] DC offset: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] Min level: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] Max level: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] Min difference: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] Max difference: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] Mean difference: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] RMS difference: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] Peak level dB: -6153.053111
[Parsed_astats_0 @ 0000000003c3aec0] RMS level dB: -inf
[Parsed_astats_0 @ 0000000003c3aec0] RMS peak dB: -3076.526556
[Parsed_astats_0 @ 0000000003c3aec0] RMS trough dB: -inf
[Parsed_astats_0 @ 0000000003c3aec0] Crest factor: 1.000000
[Parsed_astats_0 @ 0000000003c3aec0] Flat factor: -inf
[Parsed_astats_0 @ 0000000003c3aec0] Peak count: 662528
[Parsed_astats_0 @ 0000000003c3aec0] Bit depth: 0/0
[Parsed_astats_0 @ 0000000003c3aec0] Dynamic range: -inf
[Parsed_astats_0 @ 0000000003c3aec0] Zero crossings: 0
[Parsed_astats_0 @ 0000000003c3aec0] Zero crossings rate: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] Overall
[Parsed_astats_0 @ 0000000003c3aec0] DC offset: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] Min level: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] Max level: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] Min difference: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] Max difference: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] Mean difference: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] RMS difference: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] Peak level dB: -6153.053111
[Parsed_astats_0 @ 0000000003c3aec0] RMS level dB: -inf
[Parsed_astats_0 @ 0000000003c3aec0] RMS peak dB: -3076.526556
[Parsed_astats_0 @ 0000000003c3aec0] RMS trough dB: -inf
[Parsed_astats_0 @ 0000000003c3aec0] Flat factor: -inf
[Parsed_astats_0 @ 0000000003c3aec0] Peak count: 662528.000000
[Parsed_astats_0 @ 0000000003c3aec0] Bit depth: 0/0
[Parsed_astats_0 @ 0000000003c3aec0] Number of samples: 662528

If the RMS level dB is -inf, then that channel is silent.

Gyan
  • 85,394
  • 9
  • 169
  • 201
  • Thanks for your hints! Yes, that's very quick also. Looks like there are some other differentiating fields as well, like `DC offset=0` for the silent ones. Wondering which channel are you specifying right now with `0:a:1`? Is that a `center` channel? – Tina J Feb 10 '19 at 15:12
  • 1
    `0:a:1` specifies the 2nd stream, which has 2 channels. The readings for each channel are given separately. Since each track is coded as stereo, the original channel assigments are lost. You would have to guess or make use of other means to get that info. – Gyan Feb 10 '19 at 15:54
  • I see. So `0:a:1` species second stream, channel 0. Right? – Tina J Feb 10 '19 at 16:00
  • 1
    No, if you select a stream, all channels in it will be selected. All channels in a stream are always served together. – Gyan Feb 10 '19 at 16:12
  • lol. I thought first argument in `0:a:1` specifies `ch` and last one is stream number! – Tina J Feb 10 '19 at 16:15
  • can you help with this similar question as well? https://stackoverflow.com/questions/54662733/ffmpeg-check-channels-of-a-7-1-audio-for-silence – Tina J Feb 13 '19 at 04:51
  • I got a file https://www.dropbox.com/s/8udvqjqxrew4g0w/NoChannel34.mp4 where channel 3 and 4 are silent, but `RMS level dB: -116.051367`. It doesn't say `-inf`. What should I look at now? – Tina J Sep 28 '20 at 16:37