How to stream webcam to server and manipulate the stream

Question

I'd like to stream a user's webcam (from the browser) to a server and I need the server to be able to manipulate the stream (run some C algorithms on that video stream) and send the user back information.

I have heavily looked at WebRTC and MediaCapture and read the examples here : https://bitbucket.org/webrtc/codelab/overview .

However this is made for peer-to-peer video chat. From what I have understood, the MediaStream from getUserMedia is transmitted via a RTCPeerConnection (with addStream) ; what I'd like to know is : can I use this, but process the video stream on the server ?

Thanks in advance for your help

Yes, you can send and manipulate on a server :). What specific questions do you have about it? There are numerous MCU servers out there (check out licode) — Benjamin Trent, May 13 '14 at 16:11
Thanks for your answer. Well, this example seems suited for peer to peer(s) video streaming (though I still haven't managed to make it work :/ ...). What I need to do and fail to see how is to stream the user's webcam to a server and manipulate the video stream : how/where do I access the stream (I sure can't have a browser running in the server). I don't see anywhere in the example code where I have "physical" access to the data ! — nschoe, May 13 '14 at 17:05
you would not use the browser API, you should use the [native c/c++ WebRTC API](https://code.google.com/p/webrtc/source/checkout) and you can get a call from a browser to that app that you build that the native API and manipulate it from there. — Benjamin Trent, May 13 '14 at 17:20
I did not know there was a c/c++ API and I feel a bit stupid not to have thought about it. Seems like the api is big, it will take me time to understand how to manipulate the stream. Do I still need to go to all the trouble of implementing signaling ? — nschoe, May 13 '14 at 17:51
I do not yet as I just delved into it myself but licode and others have existing native interfaces that may give you direction. — Benjamin Trent, May 13 '14 at 17:57
Okay. I will go check the code and when I'm successful, I'll go back here and post an answer for the others. In between, if somebdy comes here and knows something about this whole thing, he would be very welcomed ! (particularly if he can tell us whether implementing signaling is necessary or not :-) ) — nschoe, May 13 '14 at 18:00
@nschoe Although I haven't used the Native API, signalling still seems crucial for setting up an RTC connection to your server. The SDP (Session Description Protocol) describes who you are, where you are and what media (&codec) you are gonna use (you refers to the browser). ICE candidates are also important to establish the connection. I suggest you to read something about setting up a webRTC connection. [This](http://www.html5rocks.com/en/tutorials/webrtc/infrastructure/) has a lot of information about signalling. The system should work kinda work the same in c. — MarijnS95, May 13 '14 at 20:53
... I also recommend you to look through [this getting started](http://www.webrtc.org/reference/getting-started) document, especially the examples. — MarijnS95, May 13 '14 at 20:54
@MarijnS95 thanks for the links. I will read them thoroughly. I came to the same conclusion that signalling is still needed, but in every examples that I have read (on html5rocks and codelab for instance) I can never understand how this works. They say that signalling is independant from the WebRTC (_i.e._ we can use whatever means me want for signalling) but in this case, **how** can WebRTC "know" that we have indeed done the signalling part ? — nschoe, May 14 '14 at 06:43
@nschoe They probably mean that it is not implemented for redundancy. The idea is that website makers implement it themselves. So whether you want to send the data over a websocket, XHR longpolling, socket.io or just present it to the user as a stringified text meant to paste in the other browser, it is all the developers choice. How webRTC knows? Your webRTC object creates this data and also receives it, for instance, if `pc` is your peerConnection object, and d is the object you received you call `pc.setRemoteDescription(new RTCSessionDescription(d));` to append the SDP. There are a couple.. — MarijnS95, May 14 '14 at 07:18
.. event handlers that trigger whenever a connection is stable, `pc.onsignalingstatechange` triggers when the signalling has been done right. `pc.signalingState` contains the current status. The same for the ICE engine: `pc.oniceconnectionstatechange` and `pc.iceGatheringState` and `pc.iceConnectionState`. You can find all this in the [w3 spec](http://www.w3.org/tr/webrtc). — MarijnS95, May 14 '14 at 07:24
Okay thank you for this information. The link you posted ([this one](http://www.html5rocks.com/en/tutorials/webrtc/infrastructure/)) was of great help. However I still don't understand how this signalling can be done server-side ? Since I want my server to receive the stream, not to send one : what information does the server give the user in the signalling channel ? — nschoe, May 14 '14 at 07:27
I found `PeerConnectionInterface::ProcessSignalingMessage` in [this](http://www.webrtc.org/reference/native-apis) document, it is for processing the session sent by the client. However, I have no idea how the native API creates an answer on that. — MarijnS95, May 14 '14 at 08:19

score 11 · Accepted Answer · answered Jul 09 '14 at 17:51

Here is the solution I have designed. I post here for people seeking the same kind of information :-)

Front End side

I use the WebRTC API : get webcam stream with getUserMedia, open RTCPeerConnection (and RTCDataChannel for downside information). The stream is DTLS encrypted (mandatory), multimedia streams use RTP and RTCP. The video is VP8 encoded and the audio in Opus encoded.

Back End side

On the backend, this is the complex part. The best (yet) alternative I could find is the Janus Gateway. It takes cares of a lot of stuff, like DTLS handshake, RTP / RTCP demuxing, etc. Basically, it fires a event each time a RTP packet is transmitted. (RTP packets are typically the size of the MTU, so there is not a 1:1 mapping between video frames and RTP packets).

I then built a GStreamer (version 1.0) to depacketize the RTP packets, decode the VP8, ensure video scaling and colorspace / format conversion to issue a BGR matrix (compatible with OpenCV). There is an AppSrc component at the beginning of the pipeline and a AppSink at the end.

What's left to do

I have to take extra measures to ensure good scalability (threads, memory leaks, etc) and find a clean and efficient way of using the C++ library I have inside this program.

Hope this helps !

Can you please provide me with an example of what you did ? I'm very interested in this idea and working on implementing it. — Dabbas, Mar 14 '16 at 11:30
Sorry @Dabbas, I don't work at the company I was developing the solution for, but if you take the time to implement each step carefully, you should be able to achieve this. I find that many people struggle with WebRTC, so I started writing very detailed articles at http://nschoe.com, I will end up talking about some code like this, though in a little moment. It's a bit inactive at the moment, but I should revive it soon :-) Best of hope. — nschoe, Apr 05 '16 at 08:27

How to stream webcam to server and manipulate the stream

1 Answers1

Front End side

Back End side

What's left to do