3

I'm trying to understand how Android MediaExtractor parses H264 (contained in a container format).

If I examine the H264 stream, I see that it consists of NAL units demarcated by the sequence 00 00 00 01.

The samples returned by MediaExtractor are exactly those NAL units, each beginning with that marker -- except that, for the particular data source, the first three NAL units are concatenated. The first two NAL units are very short (29 and 8 bytes).

Why does that concatenation happen? If I were to parse the H264 by hand, how would I know to do that concatenation?

For the first three NAL units, the byte following the start code prefix is 103, 104, and 101 decimal. For most of the following NAL units, it's 65, and occasionally 101.

Paul Steckler
  • 617
  • 5
  • 19
  • 1
    You need to read this: http://stackoverflow.com/questions/24884827/possible-locations-for-sequence-picture-parameter-sets-for-h-264-stream/24890903#24890903 – szatmary Feb 26 '15 at 05:16
  • 1
    It looks like non-VCL NAL units are accumulated, then concatenated with the next VCL NAL unit. – Paul Steckler Feb 26 '15 at 07:11
  • I get the same results as MediaExtractor, as long as I remove trailing 0s after each NAL unit. Except that I'm getting 2 or 3 extra units at the end of my data that MediaExtractor doesn't give. I'm not seeing an end of stream or sequence unit type, so not sure what's going on. – Paul Steckler Feb 27 '15 at 02:29
  • End of sequence is optional, and vary very rarely used (if ever). – szatmary Feb 27 '15 at 02:32

1 Answers1

1

Your question can be answered by understanding the way that an h264 stream is formatted.

Android expects two configuration units entitled Sequence Parameter Set (SPS) and Picture Parameter Set (PPS) before any IDR/non-IDR frames (commonly referred to as iFrames and pFrames).

The first two NAL Units are concatenated merely for convenience. The hardware codec is able to ascertain that these frames are unique and configures itself according to their values. The third Unit is included to allow the codec to start working as soon as this configuration is complete.

TLDR; Decoding a raw stream like this by hand wouldn't require this structure. Instead you would just analyze each NAL Unit individually.

Jk Jensen
  • 339
  • 2
  • 4
  • 16