Synchronization of data with video using WebRTC

Question

I'm using WebRTC to send video from a server to client browser (using the native WebRTC API and an MCU WebRTC server like Kurento).

Before sending it to clients each frame of the video contained metadata (like subtitles or any other applicative content). I'm looking for a way to send this metadata to the client such that it remains synchronized (to the time it is actually presented). In addition I would like to be able to access this data from the client side (by Javascript).

Some options I thought about:

Sending the data by WebRTC DataChannel. But I don't know how to ensure the data is synchronized on a per-frame basis. But I couldn't find a way to ensure the data sent by the data channel and the video channel is synchronize (again, I hope to get precision level of single frame).
Sending the data manually to the client in some way (WebRTC DataChannel, websockets, etc.) with timestamps that match the video's timestamps. However, even if Kurento or other middle servers preserve the timestamp information in the video, according to the following answer there is no applicative way to get the video timestamps from the javascript: How can use the webRTC Javascript API to access the outgoing audio RTP timestamp at the sender and the incoming audio RTP timestamp at the receiver?. I thought about using the standard video element's timeupdate event, but I don't konw if it will work for precision level of frame, and I'm not sure what it means in a live video as in WebRTC.
Sending the data manually and attach it to the video applicatively as another TextTrack. Then use the onenter and onexit to read it synchronizely: http://www.html5rocks.com/en/tutorials/track/basics/. It still requires precise timestamps, and I'm not sure how to know what are the timestamps and if Kurento pass them as-is.
Using the statistics API of WebRTC to manually count frames (using getstats), and hope that the information provided by this API is precise.

What is the best way to do that, and how to solve the problems I mentioned in either way?

EDIT: Precise synchronization (in resolution of no more than a single frame ) of metadata with the appropriate frame is required.

You will never get perfect synchronized streams if you separate them. You could implement a buffering systems to ensure no forward progress until there is an acceptable buffer available in both streams. Your best bet is to forget the perfect frame to frame match, if you want that then encode it into the video stream as video on the fly. Apart from audio and graphics, i can not think why you would need such a high precision. One you forget the perfect timing things get a lot simpler. — Blindman67, May 25 '15 at 04:50
Thanks, good point. Anyway the question is about how to do that programmatically, assuming that I could ensure that the metadata stream has been already reached to the browser before the video stream. Your suggestion to re-encode the video sounds nice, but I still need to match the times of the video stream and metadata stream - I'm even not sure that the middle server preserves the presentation timestamps. — MaMazav, May 25 '15 at 08:59
Media streams provide some help. If you are using HTML5 video you can use buffered to return a TimeRanges object to let you know what has been buffered. The HTMLMediaElement interface provides currentTime as a read write attribute. You can use it to get the time in seconds of the video. To get the current frame number `frameNumber = Math.floor(videoElement.currentTime / frameRate);` Writing to currentTime will cause the video to seek to that time. — Blindman67, Jun 01 '15 at 06:28

score 3 · Accepted Answer · answered Feb 23 '16 at 08:17

3

I suspect the amount of data per frame is fairly small. I would look at encoding it into a 2D barcode image and place it in each frame in a way so it is not removed by compression. Alternatively just encode timestamp like this.

Then on the player side you look at the image in a particular frame and get the data out or if it.

answered Feb 23 '16 at 08:17

Erik Alsmyr

127
5

Notice that this solution is sensitive to video compression. adjusting with the size of region allocated for each 'letter' in the encoded data, adjusting contrast levels, and adding some encoded bytes of simple error correction algorithm (personally I used a popular Reed-Solomon lib) solved the compression issue for my scenario [I see that this question has a lot of views, so despite the time passed this comment might help others]. – MaMazav Feb 16 '22 at 13:33

score 2 · Answer 2 · answered Dec 02 '15 at 13:21

Ok, first lets get the video and audio using getUserMedia and lets make it raw data using

https://github.com/streamproc/MediaStreamRecorder

:

/*
 *
 *  Video Streamer
 *
 */


<script src="https://cdn.webrtc-experiment.com/MediaStreamRecorder.js"> </script>
<script>

// FIREFOX

var mediaConstraints = {
    audio: !!navigator.mozGetUserMedia, // don't forget audio!
    video: true                         // don't forget video!
};

navigator.getUserMedia(mediaConstraints, onMediaSuccess, onMediaError);

function onMediaSuccess(stream) {
    var mediaRecorder = new MediaStreamRecorder(stream);
    mediaRecorder.mimeType = 'video/webm';
    mediaRecorder.ondataavailable = function (blob) {
        // POST/PUT "Blob" using FormData/XHR2

    };
    mediaRecorder.start(3000);
}

function onMediaError(e) {
    console.error('media error', e);
}
</script>



// CHROME

var mediaConstraints = {
    audio: true,
    video: true
};

navigator.getUserMedia(mediaConstraints, onMediaSuccess, onMediaError);

function onMediaSuccess(stream) {
    var multiStreamRecorder = new MultiStreamRecorder(stream);
    multiStreamRecorder.video = yourVideoElement; // to get maximum accuracy
    multiStreamRecorder.audioChannels = 1;
    multiStreamRecorder.ondataavailable = function (blobs) {
        // blobs.audio
        // blobs.video
    };
    multiStreamRecorder.start(3000);
}

function onMediaError(e) {
    console.error('media error', e);
}

Now you can send the data through DataChannels and add your metadatas, in the receiver side:

/*
 *
 *  Video Receiver
 *
 */


 var ms = new MediaSource();

 var video = document.querySelector('video');
 video.src = window.URL.createObjectURL(ms);

 ms.addEventListener('sourceopen', function(e) {
   var sourceBuffer = ms.addSourceBuffer('video/webm; codecs="vorbis,vp8"');
   sourceBuffer.appendBuffer(/* Video chunks here */);
 }, false);

Seems like a nice direction. However I don't understand yet how can I SYNCHRONIZE the metadata to the correct video frame? And anyway I prefer not to use MediaSource as it has some limitation (e.g. segment should start with keyframe https://code.google.com/p/chromium/issues/detail?id=229412) — MaMazav, Dec 02 '15 at 14:08
Well, you have the vídeo data, so now you can manipulate it in a first try maybe with timers, and put more or less latency as needed, it is the only way i think we have right now, or dont use webRTC, go for websockets. — Jairo, Dec 02 '15 at 14:43
Timers is not precise enough for my needs. Websockets is a great idea which I already investigated, however then I need to use MediaSource which has its disadvantages. — MaMazav, Dec 02 '15 at 14:58
Ok, let me write an answer using websockets, you dont need mediasource, the problem here is using webRTC, the vídeo from the server is a file, static vídeo, not real time like a conference, right? — Jairo, Dec 02 '15 at 15:02
It is realtime video. Of course I don't have any reason to go for webrtc if I need metadata and have neither latency requirements nor peer ro peer connection. — MaMazav, Dec 02 '15 at 15:04
There is no way for webRTC to send metadata in each frame, because it is a protocol for raw data only, believe me that you can't use webRTC for that, its better to try to send the data through a fast channel, maybe try performance and latency for websockets and datachannels — Jairo, Dec 02 '15 at 15:16

Synchronization of data with video using WebRTC

2 Answers2

Linked