I'm trying to understand how Android MediaExtractor parses H264 (contained in a container format).
If I examine the H264 stream, I see that it consists of NAL units demarcated by the sequence 00 00 00 01.
The samples returned by MediaExtractor are exactly those NAL units, each beginning with that marker -- except that, for the particular data source, the first three NAL units are concatenated. The first two NAL units are very short (29 and 8 bytes).
Why does that concatenation happen? If I were to parse the H264 by hand, how would I know to do that concatenation?
For the first three NAL units, the byte following the start code prefix is 103, 104, and 101 decimal. For most of the following NAL units, it's 65, and occasionally 101.