I previously wrote up a very consolidated rough answer on how P2P roughly works with some discussion on various protocols and corresponding open-source libraries. You can read it here.
The reliability of P2P is ultimately a result of how much you invest in it from both a client coding perspective and a service configuration (i.e. signaling servers and relays). You can settle for easy NAT traversal of UDP with no firewall support. Maybe a little more effort and you get TCP connectivity. And you can go "all the way" and have relays that have HTTPS listeners for clients behind the hardest of firewalls to traverse.
As to the answer of your question about firewalls. Depends on how the Firewall is configured. Many firewalls are just glorified NATs with security to restrict traffic to certain ports and block unsolicited incoming connections. Others are extremely restrictive and just allow HTTP/HTTPS traffic over a proxy.
The video conference apps will ultimately fallback to emulating an HTTPS connection over the PC's configured proxy server to port 443 (or 80) of a remote relay server if it can't get directly connected. (And in some cases, the remote client will try to listen on port 80 or port 443 so it can connect direct).
You are absolutely right to assume that having all the clients going through a relay will be expensive to maintain. If your goal is 100% connectivity no matter what type of firewall the clients is behind, some relay solution will have to exist. If you don't support a relay solution, you can invest heavily in getting the direct connectivity to work reliably and only have a small percentage of clients blocked.
Hope this helps.