0

There is a similar question Coturn server - Relay is not working but it doesn't apply as Amazon has it's servers behind a NAT, which we don't have.

We have a bare metal machine connected to the open internet. When we force the WebRTC connection to use relay, candidates are exchanged and we see a relay candidate, but then there are no streams.

Here's the config. We played with all kinds of things, like setting external-ip, switching TLS on and off, to no avail.

listening-ip=<public-address>
listening-port=443
tls-listening-port=443
cert=/etc/star-attach-live-certificate.pem
pkey=/etc/star-attach-live-certificate.key

min-port=49152
max-port=65535

realm=turn.example.com
server-name=servername.example.com

fingerprint
max-bps=550000
use-auth-secret
static-auth-secret=<secret>
check-origin-consistency
userdb=/usr/local/var/db/turndb

realm points to the loadbalancer which may redirect to this or another server, but I guess this is not the issue.

When I call the server from https://webrtc.github.io/samples/src/content/peerconnection/trickle-ice/ I get all 3 candidates.

ICE Trickle response

https://test.webrtc.org tool gives me this:

WebRTC Test Tool

I believe the result is wrong, because I am seeing the srflx candidate above and communication via srflx is actually working.

Here are both sides from chrome://webrtc-internals. I see no candidate pair being established.

Client A Client B

relay candidates look like this:

sdpMid: 0, sdpMLineIndex: 0, candidate: candidate:3193418391 1 udp 24846335 95.216.16.200 55992 typ relay raddr 0.0.0.0 rport 0 generation 0 ufrag qW0i network-id 1 network-cost 10

We're running out of ideas what else we could try. We have even swapped the server to avoid potential hardware issues. We run this in a Docker container on Alpine. What we haven't tried is to use Ubuntu 16.04LTS.

Anybody can see what's wrong or has ideas what else we could try?


Extra information:

After digging in further, we found that relay sometimes works great, sometimes does not. While it works fine in the US, it seems issues happen when a user in the EU is called. I suspect EU providers are hiding prflx candidates, forcing a relay connection which then fails. Even where it's normally not working, it works on rare occasions, and when I force a relay connection from my US location, the streams are relayed just fine. When it works, bandwidth used on coturn is almost the same for in and out, but when it fails, inbound bandwidth is a lot higher than outbound, as seen in this image.

Bandwidth used on coturn

I can't add the complete dump here, but I posted it into https://fippo.github.io/webrtc-dump-importer/ and extracted the relevant part showing connection establishment and candidate pairs.

Candidate pairs

Oliver Hausler
  • 4,900
  • 4
  • 35
  • 70
  • 1
    All that looks ok. You might want to check if two clients on the same server can connect to each other. If not its probably due to firewall rules blocking udp traffic from the relayed candidates. – Philipp Hancke Apr 27 '19 at 09:59
  • 1
    As Philipp mentioned, it looks like the firewall rules. Could you also create a dump from `chrome://webrtc-internals/` and paste it here? – Mariusz Beltowski Apr 27 '19 at 15:50
  • It's solved. @PhilippHancke pointed us in the right direction. It wasn't a firewall, but packets were blocked because we've been running coturn in a Docker container with host networking enabled. coturn had superuser privilege inside the container, but the container itself did not. Ports < 1024 are blocked without superuser privilege on Linux, and apparently Docker manages this for REST (why STUN and admin portal worked) but it apparently doesn't work for coturn. In other words, the container must be privileged, too. – Oliver Hausler Apr 27 '19 at 16:00
  • @MariuszBeltowski Well, apparently there were two issues here. Philipp's suggestion solved one. The remaining issue I believe is with coturn, which I filed here https://github.com/coturn/coturn/issues/387 - but not entirely sure, could also be a config issue despite we tried almost everything by now. I can't easily post the complete dump here and I think I am not supposed to, either, but I have created a Gist at https://gist.github.com/oliverhausler/90013c86a8ff2015b0b394d01ce1e824 in case you need access to the full dump. Second connection is the one which failed. – Oliver Hausler May 10 '19 at 13:54
  • We found the problem. It was a bug in our client. After creating the answer, peer connection was initialized asynchronously `=>`, and if it took too long and turn was too fast, it wouldn't receive all candidates. Strange that something like this can result in packet loss on coturn, but it did. @PhilippHancke thanks again for pointing me in the right direction. Write a response if you like, so I can accept it. – Oliver Hausler May 11 '19 at 17:26

0 Answers0