WebRTC: Synchronize video frames between JavaScript and Native Code peers

Question

It follows the design I'm trying to implement a proper way: I have a JavaScript peer that is sending a video track to a Native Code peer. At some point during the transmission (actually immediately after the connection has been established, but it could be at any moment) I want to start a stopwatch on JS peer side and perform some temporized operations, actually some rendering on a canvas overlaying the video playback. On Native peer side I want to be able to synchronize on the instant the stopwatch started on JS peer, and consider only received frames recorded after that instant, performing some other kind of processing. What I am doing now (fragile and limiting solution):

As soon as the peers connect, tracking RTCPeerConnection.iceConnectionState, I start the stopwatch on the JS peer;
As soon as the first webrtc::VideoFrame arrives on the Native peer I store the frame timespam;
On Native peer I use first frame timestamp to compute relative time in a similar way the stopwatch allows me on JS peer.

This design is limiting because I may want to synchronize on any instant, not just on peers connection establishment, and also fragile because I think the WebRTC protocol is allowed to drop the very first received frames for any reason (delays or transmission errors). Ideally I would like to take a timestamp at the chosen synchronization point in the JS peer, send it to the Native peer and be able to compare webrtc::VideoFrame timestamps. I am unable to do it naively because VideoFrame::timestamp_us() is clearly skewed by some amount I am not aware of. Also I can't interpret VideoFrame::timestamp(), which is poorly documented in api/video/video_frame.h, VideoFrame::ntp_time_ms() is deprecated and actually always return -1. What I should do to accomplish this kind of synchronization between the two peers?

I would propose in-band signal. You intend to manipulate the JS side anyway, using canvas. You can simply send an all-black frame with canvas setPixelData() when you want to start sync. The native client can recognize such frame, skip it (you don't want flicker, do you?) and switch to the 'other kind of processing'. — Alex Cohn, Oct 11 '18 at 19:03
I'm not manipulating JS side video at all, it's just an overlay in a canvas. The video track is a stream from the webcam I got with `getUserMedia()`. The sync signal should be as much discreet as possibile: black frames definitely not good, and also I could have so bad luck to miss it completely. Before ever thinking of it I need proofs that video manipulation of video tracks from `getUserMedia()` is actually possibile. But seriously: I think I standard a standard called Web Real-Time Communication should really give me some facilities to synchronize two peer streams in the way I described. — ceztko, Oct 11 '18 at 19:37
@AlexCohn Very interesting: I discovered that manipulating `MediaStreamTrack` is possible trough redirecting it to canvas element and capturing it with `HTMLCanvasElement.captureStream()`. This is not yet supported in MS Edge, which means is not good solution for me yet. Still, I really think some other solution should be possible in Native Code peer: if `webrtc::VideoFrame` preserved the RTP timestamp, and this would be comparable to JS side `Date.now()`, this problem would actually be no brainer. — ceztko, Oct 12 '18 at 16:11
I am afraid that JS has no access to VideoFrame internals, thus not to its timestamp. As for out-of-band synchronization, I would choose simple UTC. Time of delivery for video frames should be much smaller than 200 ms between frames in standard video, so you you won't miss a single frame between sender and receiver. As for local clock skew, this is probably negligible on modern connected devices, but even if it's not, there are well known algorithms to compensate it. — Alex Cohn, Oct 12 '18 at 16:50
@AlexCohn yes, I know JS has no access to internals (things are [changing](https://www.w3.org/TR/webrtc-stats/#dom-rtcoutboundrtpstreamstats), though): I was talking about Native Code side also not exposing remote peer timestamps or the estimated skew (not in a obvious way, at least, and the API is not really documented). I asked the same question in [webrtc-discuss](https://groups.google.com/forum/#!topic/discuss-webrtc/npYIyxSBOLI): I still hope some dev actually working on internals will provide some support. — ceztko, Oct 12 '18 at 17:36
Nice to know they are working on exposing statistics; it's hard to believe this will be available cross-browser any time soon. In this sense, it's not better than `captureStream()`. — Alex Cohn, Oct 14 '18 at 10:59

score 1 · Accepted Answer · answered Feb 16 '19 at 17:31

The design can be properly implemented by sending synchronization event timestamps in NTP sender time to the receiver. The receiver then must be able to estimate sender NTP timestamps on the frames, comparing it to the the the timestamp of the synchronization event. A Proof of concept patch enabling this method exists and has been pushed to Native WebRTC project in this tracking issue. More details to come later.

WebRTC: Synchronize video frames between JavaScript and Native Code peers

1 Answers1