ITU-T Rec. H.264 & Annex B
Recommendation H.264 is a video codec standard defined by the International Telecommunication Union, Telecommunications Standardization Sector (ITU-T). It is available free of charge and can be downloaded from their website.
The standard defines a bytestream format, whose lowest level of abstraction is the NALU (Network Layer Abstraction Unit).
32 types of NALUs can exist, although about 11 are reserved or unused. Some carry video slice data, some don’t. Two NALU types will be important later in this discussion: SPS (Sequence Parameter Set) and PPS (Picture Parameter Set). Both are required to decode a video slice, and provide important information about the stream, such as its size and interpretation of the raw data.
H.264 leaves undefined how these NALUs are transported and framed. However, it does describe one possible scheme, in the Standard’s own Annex B. This scheme, for want of a better name, is generally referred to as Annex B.
The scheme consists in prefixing the NALUs with an easy-to-synchronize-to start code that cannot occur within a NALU: A 3- or 4-byte pattern 00 00 01
or 00 00 00 01
. The rest of the NALU then follows. This scheme is popular in hardware and/or streaming situations because it allows acquiring bit-lock and byte-alignment easily, sends the SPS/PPS “in-band” periodically and thus allows one to tune into the stream at a random point to begin decoding, and has the interesting property that between NALUs one can validly send an arbitrary number of 0 bits or bytes.
ISO/IEC 14496 MPEG-4 & AVCC
MPEG-4 is a family in multiple “parts” of standards for Audio-Video coding and storage made by a joint group of the International Standards Organization (ISO) and International Electrotechnical Commission (IEC) called the Moving Pictures Expert Group (MPEG). A few parts only of the MPEG-4 family are relevant:
- MPEG-4 Part 10 / Advanced Video Coding (AVC), technically identical to ITU-T H.264. Free of charge.
- MPEG-4 Part 12, ISO Base Media File Format (BMFF), defines a generic binary container file format that can be specialized. Free of charge.
- MPEG-4 Part 14 (MP4), which specializes Part 12 for video in general and defines the
.mp4
file extension and format. This part is very expensive (88 Swiss francs) and not available to the public.
- MPEG-4 Part 15, which defines how NALU-structured video data such as Part 10/H.264 video is stored in the Part 12 ISO BMFF. This part is extremely expensive (198 Swiss francs), and not available to the public, but it, Part 14, 12 and 10 are the basis of the commonly-used
.mp4
container with H.264-coded video.
AVCC
Unfortunately, Part 15 is also the part that defines a new scheme for framing of NALUs. This scheme proposes to extract all SPS/PPS NALUs into an “out-of-band” structure called AVCC, and also strips and replaces the start code prefixes in front of NALUs by an (almost-always) 4-byte number representing the size, in bytes, of the following NALU.
This scheme is popular for fast- and random-seeking through video data, and by gathering all video decoder configuration data (SPS/PPS) in one standardized place, one can configure the video decoder once at the beginning and thereafter not worry about unexpected surprises like a dynamic change in the size of the video frame (which Annex B allows).
Fortunately, hints about AVCC’s structure exist online, as does code to translate between AVCC and Annex B.
Your needs
You seem to need AVCC -> Annex B conversion. This can be done with FFmpeg’s bitstream filter, h264_mp4toannexb
:
ffmpeg -i INPUT.mp4 -codec copy -bsf:v h264_mp4toannexb OUTPUT.ts