14

I'm trying to live stream H.264 content to HTML5 using the media source extensions API.

The following method works pretty well:

ffmpeg -i rtsp://10.50.1.29/media/video1 -vcodec copy -f mp4 -reset_timestamps 1 -movflags frag_keyframe+empty_moov -loglevel quiet out.mp4

and then: mp4box -dash 1000 -frag 1000 -frag-rap out.mp4

I can take the MP4Box output (out_dashinit.mp4) and send it through Web Sockets, chunk by chunk, to a JavaScript client that feeds it to the media source API.

However, this is not a good method for live content.

What I'm trying to do now, is to create a single pipeline in order to do it in realtime and with the minimum possible latency. With FFmpeg it's possible to redirect the output to stdout instead of out.mp4 and grab the content. I couldn't figure out if it's possible to combine MP4Box into the pipeline.

  1. Can MP4Box take the input data from a source which is not a file?
  2. Can MP4Box grab such a content progressively (either from a file or other source) while it is arriving in realtime? i.e. wait a little if stream stops for 1 sec and resume automatically.
  3. Same question but for the output: can it output to something which is not a file (such as stdout) and can it do so progressively so that whenever output data is ready, I will be able to take it and transfer it to the web client, essentially generating a never-ending dashed MP4.
aergistal
  • 29,947
  • 5
  • 70
  • 92
galbarm
  • 2,441
  • 3
  • 32
  • 52

2 Answers2

12

You don't need MP4Box to generate the required output, but you'll need to chunk the content yourself looking for boxes in the generated file.

Basically you'll generate an fMP4 with H264, and send to the browser the moov box for initialization and the moof+mdat boxes for each fragment of MP4 that you generate. You'll have to code the player in JavaScript, you probably won't be able to use a standard DASH player.

To generate the correct fragmented MP4, you need to pass this to ffmpeg: -movflags empty_moov+omit_tfhd_offset+frag_keyframe+default_base_moof.

Be sure to use the latest version available.

Pablo Montilla
  • 2,941
  • 1
  • 31
  • 35
  • fantastic. It works! Do you also have a suggestion for reducing the latency? I currently have about 3-4 secs of latency – galbarm Jun 04 '15 at 13:16
  • 1
    You have to work with the parameters available for x264. The first one I'll check is the `-tune zerolatency`, and work my way from there. – Pablo Montilla Jun 04 '15 at 13:39
  • But I don't transcode the video. As you can see I use -vcodec copy and I know for sure that the video is arriving with almost zero latency. – galbarm Jun 04 '15 at 13:41
  • 1
    Also, I've just learned that there's a dash profile for live streaming (urn:mpeg:dash:profile:isoff-live:2011), so maybe you can tweak your output and use a standard player for that. – Pablo Montilla Jun 04 '15 at 13:41
  • 1
    I see now that you copy video...I don't know then...;) You shouldn't have much latency. Try changing `-frag_duration` to a small value, maybe that way you have better latency. – Pablo Montilla Jun 04 '15 at 13:43
  • The camera sends video with key frame interval of exactly 1 sec. Since I use frag_keyframe I expect that the latency will be roughly 1sec + some short delta. frag_duration didn't help. I will keep investigating and update if I find something. – galbarm Jun 04 '15 at 13:50
  • galbarm Can you please post the full ffmpeg and mp4box commands you are finally using? I have a similar environment and I don't get the video to reproduce using mediasource. I assume I'm not encoding correctly. – Silvia Jun 12 '15 at 19:32
  • 3
    @Silvia As Pablo suggested, I'm no longer using MP4box since I got everything set up with ffmpeg. I'm still struggling with the latency issue but other than that, it is working well. The ffmpeg command is: "ffmpeg -i rtsp://172.20.28.52:554/h264 -vcodec copy -an -f mp4 -reset_timestamps 1 -movflags empty_moov+default_base_moof+frag_keyframe -loglevel quiet -" I'm grabbing the ffmpeg output through stdout and streaming it to the web using web sockets – galbarm Jun 14 '15 at 13:29
  • 1
    I posted a follow up question for the latency issue here: http://stackoverflow.com/questions/30868854/flush-latency-issue-with-fragmented-mp4-creation-in-ffmpeg – galbarm Jun 16 '15 at 13:23
  • @gelbarm What library do you use on client side (JS)? Just simply drawing it on canvas? Can you please provide some code? Thanks. – Zsolt Aug 31 '15 at 14:08
  • 3
    @gelbarm Do you need to split the frames in any way (e.g. exactly one MP4 fragment per frame) when you send them via websocket? or any number of bytes will work since the video/MediaSource reconstruct fragments correctly? I am trying to do the exact same thing but it works only 10% of the time. – Philippe Cayouette Sep 11 '15 at 13:32
  • 3
    @PhilippeCayouette you need to do it at the fragment level. Basically you can parse the stream of encoded bytes in box boundaries and send a `moof`+`mdat` boxes. That can be parsed by the media source object once you initialized it correctly (with a `moov` box). – Pablo Montilla Sep 11 '15 at 13:36
  • 1
    @PabloMontilla I followed your suggestion by capturing the moov box at the beginning of the stream and send this moov box to any new client, followed by the current moof+mdat boxes. This does not seem to work with the internal chrome video player (however VLC is able to play a file with a "hole" like that between the moov and the first moof+mdat). It works in chrome only when there is no "hole" in the sequence of moof+mdat. I even tried to modify the sequence number of moof to start at 1 and increment monotonically, to no avail. Any suggestions? – Philippe Cayouette Sep 12 '15 at 14:15
  • 1
    @PhilippeCayouette you can poke around in chrome://media-internals and see if there is an error there. Chrome is a bit picky on the encoder profile it can use when using MSE. A file can play correctly in and not at all with MSE. – Pablo Montilla Sep 12 '15 at 14:17
  • 1
    @PhilippeCayouette also, be sure to send the append the moov box alone, and then the media in box pairs. – Pablo Montilla Sep 12 '15 at 14:19
  • 2
    @PabloMontilla could you share a way to split the stream in moov/moof+dat fragment to send them correctly to the client? – Jamby Jun 12 '16 at 18:16
  • It is not difficult, but it's not trivial either. I'd start by looking at the BMFF file format structure (which is what you'll send to the client). It is basically a stream of boxes with `type`, `length` and `content` (which can be a box itself). If you can parse that structure, you are basically set as sending the correct boxers in tandem is not a problem. – Pablo Montilla Jun 13 '16 at 13:41
  • @PabloMontilla I'm a little late to the party but I'm working on doing exactly that. Except I'm planning to send data with an RTCDataChannel (essentially UDP) instead of WebSockets, which should technically give me even better latency. I'm quite new at all this video stuff and I am having a really hard time understanding all this talk of `moof`, `mdat` and `moov` and what I need to do with the mp4 chunks I receive, before passing them on the the SourceBuffer. Can you provide some guidance? – snowfrogdev Jan 08 '19 at 17:56
  • I’m not close to a computer now, but if you find a description of the BMFF file format, you’ll find that it is an “easy” format to parse that consists of “boxes” with a type and data. Once you can parse that format you can start sending packets with the correct structure to be used with the sourceBuffer. The use of RTCDataChannel should not be a problem, and should give you better latency (was not an option when I did this work). – Pablo Montilla Jan 08 '19 at 19:55
1

As far as I understand your solution you are not streaming but progressively downloading a single MP4 file. Am I understanding that correctly?

I recently started the RTP2DASH project to do real DASH live streaming from a RTP datasource. It is still very 'alpha' but it should be easily adoptable for simple usecases.

Sebastian Annies
  • 2,438
  • 1
  • 20
  • 38
  • 1
    No. My solution is indeed video streaming. Although the content of the data is a fragmented mp4 format, the data is never written to a file. – galbarm Sep 14 '15 at 07:55
  • I understood that but still it's no DASH, right? There is no Manifest and no multiple qualities. It's transforming an RTSP stream to a progressive download. – Sebastian Annies Sep 14 '15 at 18:55
  • 1
    Correct. It is not an adaptive streaming solution. It is a lowest possible latency - single quality solution. – galbarm Sep 15 '15 at 12:00
  • One of the benefits is that transcoding is not needed for this kind of solution. It is very light. You can potentially stream hundreds of streams from a single server. – galbarm Sep 15 '15 at 12:18
  • transcoding is not required with my proposal as well BUT you get a DASH stream. I got the impression that using DASH was kind of a requirement - of course if you do progressive download with the HTML5 video tag it's as lightweight as it gets. I met the http://mistserver.org/ guys on IBC. Their showcase last year was exactly your way of streaming to a few hundred client from a raspberry pi. Might be worth a look! – Sebastian Annies Sep 16 '15 at 09:52