21

After careful reading of FFmpeg Bitstream Filters Documentation, I still do not understand what they are really for.

The document states that the filter:

performs bitstream level modifications without performing decoding

Could anyone further explain that to me? A use case would greatly clarify things. Also, there are clearly different filters. How do they differ?

Palec
  • 12,743
  • 8
  • 69
  • 138
Joe
  • 2,386
  • 1
  • 22
  • 33

1 Answers1

51

Let me explain by example. FFmpeg video decoders typically work by converting one video frame per call to avcodec_decode_video2. So the input is expected to be "one image" worth of bitstream data. Let's consider this issue of going from a file (an array of bytes of disk) to images for a second.

For "raw" (annexb) H264 (.h264/.bin/.264 files), the individual nal unit data (sps/pps header bitstreams or cabac-encoded frame data) is concatenated in a sequence of nal units, with a start code (00 00 01 XX) in between, where XX is the nal unit type. (In order to prevent the nal data itself to have 00 00 01 data, it is RBSP escaped.) So a h264 frame parser can simply cut the file at start code markers. They search for successive packets that start with and including 00 00 01, until and excluding the next occurence of 00 00 01. Then they parse the nal unit type and slice header to find which frame each packet belongs to, and return a set of nal units making up one frame as input to the h264 decoder.

H264 data in .mp4 files is different, though. You can imagine that the 00 00 01 start code can be considered redundant if the muxing format already has length markers in it, as is the case for mp4. So, to save 3 bytes per frame, they remove the 00 00 01 prefix. They also put the PPS/SPS in the file header instead of prepending it before the first frame, and these also miss their 00 00 01 prefixes. So, if I were to input this into the h264 decoder, which expects the prefixes for all nal units, it wouldn't work. The h264_mp4toannexb bitstream filter fixes this, by identifying the pps/sps in the extracted parts of the file header (ffmpeg calls this "extradata"), prepending this and each nal from individual frame packets with the start code, and concatenating them back together before inputting them in the h264 decoder.

You might now feel that there's a very fine line distinction between a "parser" and a "bitstream filter". This is true. I think the official definition is that a parser takes a sequence of input data and splits it in frames without discarding any data or adding any data. The only thing a parser does is change packet boundaries. A bitstream filter, on the other hand, is allowed to actually modify the data. I'm not sure this definition is entirely true (see e.g. vp9 below), but it's the conceptual reason mp4toannexb is a BSF, not a parser (because it adds 00 00 01 prefixes).

Other cases where such "bitstream tweaks" help keep decoders simple and uniform, but allow us to support all files variants that happen to exist in the wild:

  • mpeg4 (divx) b frame unpacking (to get B-frames sequences like IBP, which are coded as IPB, in AVI and get timestamps correct, people came up with this concept of B-frame packing where I-B-P / I-P-B is packed in frames as I-(PB)-(), i.e. the third packet is empty and the second has two frames. This means the timestamp associated with the P and B frame at the decoding phase is correct. It also means you have two frames worth of input data for one packet, which violates ffmpeg's one-frame-in-one-frame-out concept, so we wrote a bsf to split the packet back in two - along with deleting the marker that says that the packet contains two frames, hence a BSF and not a parser - before inputting it into the decoder. In practice, this solves otherwise hard problems with frame multithreading. VP9 does the same thing (called superframes), but splits frames in the parser, so the parser/BSF split isn't always theoretically perfect; maybe VP9's should be called a BSF)
  • hevc mp4 to annexb conversion (same story as above, but for hevc)
  • aac adts to asc conversion (this is basically the same as h264/hevc annexb vs. mp4, but for aac audio)
Ronald S. Bultje
  • 10,828
  • 26
  • 47
  • 9
    Note that bitstream filters and parsers are different than regular video and audio filters in that they operate on the encoded (normally compressed) bitstream, whereas regular video and audio filters operate on the uncompressed video and audio. – mark4o Aug 16 '15 at 14:58
  • Similar to what mark4o is saying, note that when running ffmpeg and not doing a streamcopy (i.e. actually transcoding the video) the bitstream filters get applied after the re-encode. – bhh1988 Nov 16 '15 at 01:52
  • do Bitstream Filters have something to do with stream extracting? For example can they help to extract subtitles or audio streams faster from video container? – user25 Aug 18 '18 at 19:52
  • There (or how) such information can be [officially] found? Came to this answer after very long search. Almost all demuxing/decoding examples just does not cover neither bitstream filtering neither packet parsing. I'm very confused with fact what I just have to somehow figure out special tricks for each possible video format I can meet -- while I trying to use ffmpeg which "takes care of all the hard work for you" just to avoid all such stuff... (My context: I'm trying to demux video files with ffmpeg and pass frame data to NVDEC for decoding) – vsvasya Dec 21 '18 at 04:32
  • 1
    If you use ffmpeg the tool, or the standard libavformat/avcodec API, it will take care of this for you. Bitstream filters are automatically inserted and parsers are automatically used when using the official read_frame() and send_packet() API. However, if you use your own demuxers, you need to make sure the output of your demuxer is in the same format as expected as input for the decoder, and that's why the individual BSF/parser APIs are exposed also. – Ronald S. Bultje Dec 21 '18 at 13:16
  • Yes, I use libavformat API. But it does not automatically filter packets inside read_frame(). I came out to solution where I should check packet data for 00 00 00 01 start code presence and pass packet through filter if not. May be it was changed in newest ffmpeg versions? (However we still have to support ffmpeg2.8 anyway...) – vsvasya Dec 26 '18 at 03:34
  • Ahh, I see. They are applied automatically only in read_frame()/send_packet() combo. Precisely, somewhere inside send_packet - which is not our case since we use different decoder (trying to pair it with standard ffmpeg demuxer). So still in search how to avoid specific knowledge about which filter I must apply to each of possible format.. FFmpeg docs are rather scarce and source code is not simple to find anwser without spending much time.. – vsvasya Jan 04 '19 at 12:03