Multi-party WebRTC without SFU

Question

Based on this article, when implementing a WebRTC solution without a server, I assume it means SFU, the bottleneck is that only 4-6 participants can work.

Is there a solution that can work around this? For example, I just want to use Firebase as the only backend, mainly signaling and no SFU. What is the general implementation strategy to achieve at least 25-50 participants in WebRTC?

Update: This Github project shares a different statement. It states "A full mesh is great for up to ~100 connections"

with webrtc you always need a server. Regular webrtc sends its video peer to peer and needs a server for signaling. With an SFU all the media sending is handled by the server. Same for an MCU — Dirk V, May 28 '20 at 20:11
By the way 100 connections doesn't mean 100 participants. In a mesh every person is connected to everyone else. if there are N users, then the amount of connections is about N^2. So 100 connections actually means 10 participants — Dirk V, May 28 '20 at 20:14

score 2 · Answer 1 · answered May 28 '20 at 15:52

2

Your real bottleneck with MESH is that each RTCPeerConnection will do its own video encoding in the browser.

The p2p concept naturally includes the requirement that both peers should adjust encoding quality based on network conditions. So, when your browser sends two streams to peers X (good download speed) and Y (bad download speed), the encodings for X and Y will be different - Y will receive lower framerate and bitrate than X.

Sounds reasonable, right? But, unfortunately, mandates separate video encoding for each peer connection.

If multiple peer connections could re-use the same video encoding, then MESH would be much more viable. But Google didn't provide that option in the browser. Simulcast requires SFU, so that's not your case.

So, how many concurrent video encodings can browser perform on a typical machine, for 720p 30 fps video? 5-6, not more. For 640x480 15 fps? Maybe 20 encodings.

In my opinion, the encoding layer and networking layer could be separated in WebRTC design, and even getUserMedia could be extended to getEncodedUserMedia, so that you could send the same encoded content to multiple peers.

So that's the real practical reason people use SFU for multi-peer WebRTC.

answered May 28 '20 at 15:52

user1390208

1,866
20
20

So the solution is to lower the encoding frame size to achieve more participants – quarks May 28 '20 at 16:00
1

Yes. And I guess if you do audio-only, MESH should be totally fine with even 200 participants. – user1390208 May 28 '20 at 16:01
2

I did 50 concurrent live Opus encodings once and it took only 30% CPU, so, I guess, 200 should be doable on a strong machine – user1390208 May 28 '20 at 17:19
@user1390208 When you did your 50 opus encodings. Does that mean like 7 people in an audio conference? As every person has 1 outgoing and 6 receiving so 7*7 connections. Is that the configuration? – Dirk V May 28 '20 at 20:05
@user1390208 when you said "200 participants" did you mean literal 200 participants? So N^2 of that is 40,000 connections, right? – quarks May 28 '20 at 20:57
@user1390208 it's really promising if we can achieve 200 participants, 100 even is very good already for audio-only. – quarks May 28 '20 at 20:58
1

@quarks Yes I meant two hundred participants. You will send 200 streams (199 to be precise) so you will conduct 199 Opus encodings. And you will receive 199 streams from other peers and your browser will conduct 199 Opus decodings. Should be just fine on 8 cores I7 machine. – user1390208 May 28 '20 at 21:48
@user1390208 have you tested for a "normal PC" usually people who join conferences don't have such 8-cores machine – quarks May 28 '20 at 22:27

score 1 · Answer 2 · answered May 28 '20 at 21:08

1

If you want to make a conference with 25 people all sending their video, then a regular webrtc setup will not work. Except if you massively lower your video quality. The reason for this is that every participant would need to send 24 seperate streams to every other client. So lets say you stream is 128 KB/s then you will need to have 3MB/s in upload speed available. Which isn't always available. Then also downloading that same amount.

The problem is that isn't scalable. That's why you need an SFU. Then you will only send a single stream and receive from others. The other positive thing about SFUs is that you can use simulcast which adapts the quality of your received streams depending on your network speed.

You can use the Janus gateway or mediasoup for example. Here is an already setup mediasoup video conferencing application that is scalable github repository

answered May 28 '20 at 21:08

Dirk V

1,373
12
24

An interesting demo app will try it. The question, if only audio then 200 participants or at least 100 is possible with full mesh? No server? How about if one video from the presenter and everyone is just watching, then for a 50 -participant conference that would be about 25Mbps upload for the host for a 500kbps stream? – quarks May 28 '20 at 21:19
@quarks also read the comment I put under your initial question. With a full mesh only audio, stating it has 40Kbps with the opus codec with 200 participants it would have 8Mbps upload per person. Then 4Mbps for 100 participants. Normally still possible with a good connection I would think. – Dirk V May 28 '20 at 21:29
@quarks And yes indeed it would be 25Mbps upload with one presenter broadcasting – Dirk V May 28 '20 at 21:30
@Dirk V Your only argument is bandwidth. If every participant has enough bandwidth, then mesh is totally OK. Encoding resources on a machine will typically limit you faster than bandwidth. – user1390208 May 28 '20 at 21:56
@user1390208 It's not wrong that bandwidth can be an issue and I don't agree with your statement that encoding resources are typically the first to be a problem. The reason why there is a limit of 4-6 people for webrtc video call is not because of encoding but just because of bandwidth, I think we can agree on that. Now lets say that you have an unlimited bandwidth, then you can add the encoding can also become a problem. But with a supercomputer again, then you wouldn't have a problem anymore. So yeah you can just add that encoding could also be a problem that occurs, doesn't make it wrong – Dirk V May 28 '20 at 22:15
@user1390208 And te same applies to the receiving end. And can be helped by lower quality streams – Dirk V May 28 '20 at 22:16
People who join conferences typically don't have 8-cores machine, more so a supercomputer :-) Dual-core with 4G ram may be the average – quarks May 28 '20 at 22:28
...anyway what would be the general requirements for a presenter to be able to host a video and audio to 50 participants while participants are just listening, and what would be required for participants that have audio-only webrtc to be able to work in full mesh without server? – quarks May 28 '20 at 22:31

Multi-party WebRTC without SFU

2 Answers2

Linked