Is it possible to store multiple video paragraphs, each has its owned parameters, in one track of a mp4 file?

Question

I want to encode a sequence of video frames (FHD) into a h264 stream in a way like this: From time t1 to time t2: encode with "main" profile, FHD and at 30fps. From time t3 to time t4: encode with "high" profile, HD(scaled) and at 15fps. From time t5 to time t6: encode with "main" profile, FHD and at 30fps.

Note: t1 < t2 < t3 < t4 < t5 < t6.

My question is, by complying the MP4 standard, is it possible to put video streams encoded by different parameters into a same video track of a mp4 file? If it is impossible, what is the best alternative?

Rudolfs Bundulis · Answer 1 · 2019-06-15T09:45:45.987

Yes, at least according to the specification. If you look at ISO/IEC 14496-15 (3rd edition), it contains a definition of Parameter set track:

A sync sample in a parameter set track indicates that all parameter sets needed from that time forward in the video elementary stream are in that or succeeding parameter stream samples. Also there shall be a parameter set sample at each point a parameter set is updated. Each parameter set sample shall contain exactly the sequence and picture parameter sets needed to decode the relevant section of the video elementary stream.

As I understand it, in this case instead of writing the intial SPS/PPS data into the avcC box in stbl you write a separate track containing the changing SPS/PPS data as sync samples. So at least according to the spec, you would have samples in that stream with presentation times t1,t2,t3,t4,t5 and the samples themselves would contain the updated SPS/PPS data. This quote from the same standard seems to agree:

Parameter sets: If a parameter set elementary stream is used, then the sample in the parameter stream shall have a decoding time equal or prior to when the parameter set(s) comes into effect instantaneously. This means that for a parameter set to be used in a picture it must be sent prior to the sample containing that picture or in the sample for that picture.

NOTE Parameter sets are stored either in the sample descriptions of the video stream or in the parameter set stream, but never in both. This ensures that it is not necessary to examine every part of the video elementary stream to find relevant parameter sets. It also avoids dependencies of indefinite duration between the sample that contains the parameter set definition and the samples that use it. Storing parameter sets in the sample descriptions of a video stream provides a simple and static way to supply parameter sets. Parameter set elementary streams on the other hand are more complex but allow for more dynamism in the case of updates. Parameter sets may be inserted into the video elementary stream when the file is streamed over a transport that permits such parameter set updates.

ISO/IEC 14496-15 (3rd edition) also defines additional avc3 / avc4 boxes, which, when used should allow to actually write the parameter sets in-band with the video NAL units:

When the sample entry name is 'avc3' or 'avc4', the following applies:

If the sample is an IDR access unit, all parameter sets needed for decoding that sample shall be included either in the sample entry or in the sample itself.

Otherwise (the sample is not an IDR access unit), all parameter sets needed for decoding the sample shall be included either in the sample entry or in any of the samples since the previous random access point to the sample itself, inclusive.

A different question is, even though standard allows at least two ways (in band with avc3, out of band with parameter set track) to achieve this, how many players there are which honor this. I'd assume looking at least into the sources of ffmpeg to find if this is supported there is a good start.

The answers in this question also lean towards the fact that many demuxers are only honoring the avcC box and not separate parameter set track, but a couple of quick google searches show that at least both vlc/ffmpeg forums and newsletters have mentions of these terms, so I'd say it's best to try to mux such a file and simply check what happens.

Thank you for your useful information. As I understand from the spec you quoted and your comment: "I need to store parameter set samples into a second track while the first track stores video stream. So in my case which is described above, there should be 3 parameter set samples corresponding to 3 times the encoder parameters get changed (t1, t3, t5), and dts of each sample must be aligned to the dts of the corresponding to IDR frame in the video track at which the parameter is changed.". Please correct me. — SteveH, Jun 15 '19 at 13:18
Yeah, it seems so. But you could also try using the avc3 box and put the sps/pps in mdat right before the idr that references the new sets - in this case the first idr at t2, t3, t4. I have written an mp4 muxer before, I rember trying inline sps/pps with avcC box, in which case WMP ignored them. So either try the separate stream, or in-band with avc3. — Rudolfs Bundulis, Jun 15 '19 at 13:26
I kind of like the idea of a separate stream, since if I wrote a demuxer that would be easier to parse, but the main question is what works with most players. — Rudolfs Bundulis, Jun 15 '19 at 13:33
Regarding about player, I don't worry about it because I will make the player. I am a little confused that in case I have a video stream with AnnexB enabled, PPS/SPS is embedded into the stream, is it necessary to have a separate track for parameter set and avcC box in the mp4? — SteveH, Jun 15 '19 at 13:45
If you make the player yourself, then you could simply embed them and skip the parameter set track. I thought you wanted compliance with other players. Still, I'd use avc3 instead of avcC to get conformance with the specification. — Rudolfs Bundulis, Jun 15 '19 at 13:50

Is it possible to store multiple video paragraphs, each has its owned parameters, in one track of a mp4 file?

1 Answers1