Sending periodic metadata in fragmented live MP4 stream?

Question

As suggested by the topic, I'm wondering if it's possible to send metadata about the stream contents periodically in a fragmented MP4 live stream.

I'm using the following command (1) to get fragmented MP4:

ffmpeg -i rtsp://admin:12345@192.168.0.157 -c:v copy -an -movflags empty_moov+omit_tfhd_offset+frag_keyframe+default_base_moof -f mp4 ...

My main program reads the fragments from this command from either stdout or from a (unix domain) socket and gets:

ftyp
moov
moof
mdat
moof
mdat
moof
mdat 
...

So, the first fragments I get are ftyp and moov which are metadata and describe the stream contents.

Now a client program connects at a later time to the main program. The problem is, that at that point, the ftype and moov fragments are long gone.

Is there a way (=ffmpeg command option) to make this work similar to MPEGTS (aka mpeg transport stream), and resend metadata periodically with the stream? Like this:

ftyp
moov
moof
mdat
moof
mdat
moof
mdat 
ftyp
moov
moof
mdat
moof
mdat
moof
mdat 
...

.. or is my only option to cache the ftyp and moov packets in my main program and re-send them to the client program when it requests the stream?

A related link: What exactly is Fragmented mp4(fMP4)? How is it different from normal mp4?

Caching and resending ftyp and moov each time a new client connects is not that straightforward either .. as it somehow brokes the stream (at least the browser MSE extensions don't like such a stream). There seems to be lots of sequence numbers and stuff in the moof packets that should be modified. (+)

Another option is to pass the stream through another FFmpeg process that does the remuxing (and corrects moof packets). Things are further complicated by the fact that command (1) does not give cleanly-separated ftyp, moov, moof, etc. packets.

Any thoughts / solutions appreciated.

EDIT : regarding (+), MSE seems to have problems playing fragmented MP4 with gaps: https://bugs.chromium.org/p/chromium/issues/detail?id=516114

score 7 · Accepted Answer · edited Sep 20 '22 at 12:33

I was finally able to feed fragmented MP4 to browser MSE extensions without problems.

If one starts feeding the MSE extensions with moof and mdat packets that did not come immediately after the original ftyp and moov, then ..

.. the very first moof packet that goes into the MSE extension must be a moof packet that has a special flag called first_sample_flags_present set (see the ISO/IEC 14496-12:2012(E) specs for more info)

.. otherwise the MSEs in all popular browsers freeze and there is no video playback (btw, moof sequence numbers starting from > 1 posed no problem at all).

This python package was very useful for the analysis: https://github.com/beardypig/pymp4

To pick up that flag, client-side javascript functions are provided in this answer.

Use the function getBox to find out the type of the box (ftyp, moov, moof, etc.).

For moof boxes, apply the function findFirstSampleFlag to see if the moof box has the first_sample_flags_present enabled.

function toInt(arr, index) { // From bytes to big-endian 32-bit integer.  Input: Uint8Array, index
    var dv = new DataView(arr.buffer, 0);
    return dv.getInt32(index, false); // big endian
}

function toString(arr, fr, to) { // From bytes to string.  Input: Uint8Array, start index, stop index.
    return String.fromCharCode.apply(null, arr.slice(fr,to));
}

function getBox(arr, i) { // input Uint8Array, start index
    return [toInt(arr, i), toString(arr, i+4, i+8)]
}

function getSubBox(arr, box_name) { // input Uint8Array, box name
    var i = 0;
    res = getBox(arr, i);
    main_length = res[0]; name = res[1]; // this boxes length and name
    i = i + 8;
    
    var sub_box = null;
    
    while (i < main_length) {
        res = getBox(arr, i);
        l = res[0]; name = res[1];
        
        if (box_name == name) {
            sub_box = arr.slice(i, i+l)
        }
        i = i + l;
    }
    return sub_box;
}

function findFirstSampleFlag(arr) { // input Uint8Array
    // [moof [mfhd] [traf [tfhd] [tfdt] [trun]]]
    
    var traf = getSubBox(arr, "traf");
    if (traf==null) { return false; }
    
    var trun = getSubBox(traf, "trun");
    if (trun==null) { return false; }
    
    // ISO/IEC 14496-12:2012(E) .. pages 5 and 58-59
    // bytes: (size 4), (name 4), (version 1), (tr_flags 3)
    var flags = trun.slice(9,12); // console.log(flags);
    f = flags[2] & 4; // console.log(f);
    return f == 4;
}

Thank you, your code put me to write direction. In my case I am broadcasting audio only inside MP4 and even the first MOOF packet always has TRUN flags = 1 (the same as all next packets). What I had to do in my websockets server to make clients be able to play fMP4 from any point is to adjust "base decode time" value inside "tfdt" box. It is located at offset 12 from the beginning of tfdt box and is 64-bit unsigned integer. It must start from 0 for any new connected client and has to be incremented by the value of "default sample duration" specified in the "tfhd" box. — Kirill Gribunin, Mar 04 '19 at 12:25
@El Sampsa: I'm in the same situation as you was with this. I want to send mp4 boxes to MSE but not from the beginning. I found out that in my stream I'm having that `first_sample_flags_preseNt` flag set. So I'm doing this: save the ftyp and moov, wait some time, then start a new MSE. First feed the presaved ftyp and moov, then feed the stream (moof and mdat). My trun flags are 5 (data‐offset‐present and first‐sample‐flags‐present), however "there is no video playback". Do you have any idea what could I miss? https://stackoverflow.com/questions/60580531/fmp4-moof-box-sequence-number-ordering — Daniel, Mar 07 '20 at 20:20
After your page has been loaded for the first time & a brand new MSE has been instantiated, be carefull to feed the first packets in this order: ftyp, moov, special moof packet (with the first sample flag). After that, don't feed ever again ftyp or moov packets. Also, when you call addSourceBuffer, the codec pars must be compatible with the type of stream you're feeding to MSE, i.e. video only or video+audio. Otherwise it doesn't work.. And one thing: MSE's are not implemented in the iPhone iOS, if you're trying that. — El Sampsa, Mar 08 '20 at 09:10

szatmary · Answer 2 · 2019-01-15T00:35:50.730

2

the ftyp/moov forms what is called the “initialisation fragment” and should only be written to MSE when changing streams. This is uasually handled by including the URL of the init in the manifest, and it’s the players job to request it when joining the stream.

edited Jan 15 '19 at 00:35

answered Jan 14 '19 at 20:20

szatmary

29,969
8
44
57

Hi, thanks for the reply. Could you please explain in more detail what that "including URL of the init in the manifest" means..? – El Sampsa Jan 15 '19 at 12:14
It’s documented in rfc 8216. – szatmary Jan 15 '19 at 16:02
I'm a bit confused .. I'm actually sending fragmented MP4 to the browser via websockets. I'm not sure where/how I should mix in the playlists.. Any link to a primer appreciated – El Sampsa Jan 15 '19 at 16:49
You are doing something non standard, there is no primer. But what you are doing is similar to flv/rtmp where on connect, you send the sequence headers (init) once, then start sending fragments. – szatmary Jan 15 '19 at 16:55
Yes, that's it. I am sending low-latency live video through websockets. I'm not creating a manifest that tells the browser where to download media segments. The problem is, that caching ftyp and moov and sending them later on in time, somehow creates a broken stream that the MSE does not like – El Sampsa Jan 15 '19 at 17:00
Then there is something wrong with the video files. Or there is a bug in the logic delivering it. – szatmary Jan 15 '19 at 17:12

Sending periodic metadata in fragmented live MP4 stream?

2 Answers2

Linked