Both of your sample files have 4 audio streams or tracks. Each audio track has 2 channels, with a layout of stereo.
Apparently, the audio encoder is constant bit-rate, and so the metadata cannot be used to distinguish silent tracks from sound-bearing ones.
You'll need to parse each suspect audio stream.
ffmpeg -i file -map 0:a:1 -af astats -f null -
At the end of the console log, statistics for the audio stream will be printed,
e.g.
[Parsed_astats_0 @ 0000000003c3aec0] Channel: 1
[Parsed_astats_0 @ 0000000003c3aec0] DC offset: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] Min level: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] Max level: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] Min difference: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] Max difference: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] Mean difference: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] RMS difference: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] Peak level dB: -6153.053111
[Parsed_astats_0 @ 0000000003c3aec0] RMS level dB: -inf
[Parsed_astats_0 @ 0000000003c3aec0] RMS peak dB: -3076.526556
[Parsed_astats_0 @ 0000000003c3aec0] RMS trough dB: -inf
[Parsed_astats_0 @ 0000000003c3aec0] Crest factor: 1.000000
[Parsed_astats_0 @ 0000000003c3aec0] Flat factor: -inf
[Parsed_astats_0 @ 0000000003c3aec0] Peak count: 662528
[Parsed_astats_0 @ 0000000003c3aec0] Bit depth: 0/0
[Parsed_astats_0 @ 0000000003c3aec0] Dynamic range: -inf
[Parsed_astats_0 @ 0000000003c3aec0] Zero crossings: 0
[Parsed_astats_0 @ 0000000003c3aec0] Zero crossings rate: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] Channel: 2
[Parsed_astats_0 @ 0000000003c3aec0] DC offset: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] Min level: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] Max level: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] Min difference: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] Max difference: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] Mean difference: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] RMS difference: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] Peak level dB: -6153.053111
[Parsed_astats_0 @ 0000000003c3aec0] RMS level dB: -inf
[Parsed_astats_0 @ 0000000003c3aec0] RMS peak dB: -3076.526556
[Parsed_astats_0 @ 0000000003c3aec0] RMS trough dB: -inf
[Parsed_astats_0 @ 0000000003c3aec0] Crest factor: 1.000000
[Parsed_astats_0 @ 0000000003c3aec0] Flat factor: -inf
[Parsed_astats_0 @ 0000000003c3aec0] Peak count: 662528
[Parsed_astats_0 @ 0000000003c3aec0] Bit depth: 0/0
[Parsed_astats_0 @ 0000000003c3aec0] Dynamic range: -inf
[Parsed_astats_0 @ 0000000003c3aec0] Zero crossings: 0
[Parsed_astats_0 @ 0000000003c3aec0] Zero crossings rate: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] Overall
[Parsed_astats_0 @ 0000000003c3aec0] DC offset: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] Min level: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] Max level: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] Min difference: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] Max difference: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] Mean difference: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] RMS difference: 0.000000
[Parsed_astats_0 @ 0000000003c3aec0] Peak level dB: -6153.053111
[Parsed_astats_0 @ 0000000003c3aec0] RMS level dB: -inf
[Parsed_astats_0 @ 0000000003c3aec0] RMS peak dB: -3076.526556
[Parsed_astats_0 @ 0000000003c3aec0] RMS trough dB: -inf
[Parsed_astats_0 @ 0000000003c3aec0] Flat factor: -inf
[Parsed_astats_0 @ 0000000003c3aec0] Peak count: 662528.000000
[Parsed_astats_0 @ 0000000003c3aec0] Bit depth: 0/0
[Parsed_astats_0 @ 0000000003c3aec0] Number of samples: 662528
If the RMS level dB
is -inf
, then that channel is silent.