4

I'm logging disconnects in my web game. It seems 75% of the sessions are getting disconnected with the code 1001 (normal) and 25% are getting disconnected with the code 1006 (error). https://www.rfc-editor.org/rfc/rfc6455

Sometimes on the error reason I see this text:

CloudFlare WebSocket Proxy restarting

But he majority of 1006 disconnects don't give any reason at all. The players just disconnect with no reason at all. This usually happens at 5-30 minutes mark while the player is actively playing the game.

The setup I'm has these:

My question is:

  1. How can I debug this problem better?
  2. What are common cases which might be causing this problem?
Community
  • 1
  • 1
Esqarrouth
  • 38,543
  • 21
  • 161
  • 168
  • 1006 indicates that the connection was closed in a nongraceful manner. This will happen when browser (tab?) is simply killed (rage quit? :D). I'm not sure if browser is obligated to send close frame when being closed (I think not). And even if it is I'm pretty sure not all of them do that. Another case is when the connection dies for any reason. For example when mobile goes out of network range. Yes, for mobiles this will definitely happen more often. Anyway I don't see a reason to treat 1006 specially. – freakish Feb 27 '19 at 16:30
  • It's happening at times when it should not happen, like when the player has a solid network connection, doesn't close a tab, is playing, is winning. (it's a browser pc game) 25% of connections going away like this sounds too much, there is definitely a bug somewhere and I am out of ideas on what to try next – Esqarrouth Feb 27 '19 at 16:38
  • How do you know that the user doesn't close tab? Also how big is the traffic over the connection? Is it possible that it's a timeout on some proxy server? – freakish Feb 27 '19 at 16:39
  • 1. When I try to close the tab while testing, I get the error code 1001. 2. Players occasionally send me a screenshot of their screen, and javascript console. – Esqarrouth Feb 27 '19 at 16:43
  • 1
    Are you sure that none of the intermediate servers kill the connection before 30s ping? If so then this might be an ISP/firewall issue. – freakish Feb 27 '19 at 16:45
  • Is the connection encrypted? – freakish Feb 27 '19 at 16:48
  • I am not sure of anything. But I did have that problem before, and implemented ping/pong to solve the issue. I've verified that I solved that problem. If it is still happening it has nothing to do with the 30s ping and something to do with ISP/firewall (Which I doubt because there are more than 1000 games played daily). But if that is the case how could I debug or experiment that? – Esqarrouth Feb 27 '19 at 16:51
  • Yes it's encrypted – Esqarrouth Feb 27 '19 at 16:51
  • This sort of things is very difficult to debug because (I presume) you don't know how to reproduce it. You could try to contact with one of the players with the issue. – freakish Feb 27 '19 at 17:58
  • Correct, I don't know how to reproduce it, players also don't know how to reproduce it. Apparently it happens randomly when playing. Not every time. They can sometimes finish their games, sometimes not. – Esqarrouth Feb 27 '19 at 18:00
  • So even among affected players this still happens rarely? Have you tried gathering browser data when the disconnect happens? – freakish Feb 27 '19 at 18:02
  • Not rarely, from my observation it happens every 5-10 games. It never happened to me personally, except the cloudflare socket proxy restarting. Which I am guessing is less than 10% of the disconnect issues. – Esqarrouth Feb 27 '19 at 18:08
  • What kind of browser data should I gather with what tools? (If you mean asking for basic chrome version, os version etc.. I collected those data, and see no correlation. People with exact same OS and browser versions as me are getting disconnects. Other browsers are getting them, but more rare in browsers like Safari) – Esqarrouth Feb 27 '19 at 18:09
  • Just a hunch: it sounds like the JavaScript in the browser has a long-running loop somewhere, causing the JavaScript thread / browser tab to fail (the browser will re-initialize the tab, but all the resources will be cleared in a less than graceful way). – Myst Feb 27 '19 at 20:20
  • @Myst There is a game screen and a chat box. When websocket disconnects an error message is showing up inside the chat box. The game state stays the same, chat box shows error messages, in 1 second it reconnects back. So the tab doesn't fail, refresh or close. Here is the game: https://katan.io/ Do you still think it might be a client side long-running loop with my additional information? Or does this information rule out that option? – Esqarrouth Feb 27 '19 at 20:38
  • @Esqarrouth , I misunderstood the question. If the error is on the client side than the long-running loop option in ruled out. What you’re left with is most likely a socket timeout/closure on either the server (doesn’t seem that way) or an intermediary. – Myst Feb 27 '19 at 23:51
  • Thanks, this gives me a few ideas on how to isolate it. – Esqarrouth Feb 28 '19 at 03:50
  • Maybe this answer will help: https://stackoverflow.com/a/19305172/470749 – Ryan Mar 19 '19 at 14:35

1 Answers1

5

The reason for this specific error is because Cloudflare updates software or configuration of their SSL, Firewall, Nginx, physical servers.

Almost any stack in their system being updated will kick out your websockets. You have 2 solutions:

  1. Websockets doesn't use Cloudflare
  2. Have an automatic reconnection logic
Esqarrouth
  • 38,543
  • 21
  • 161
  • 168
  • 1
    Looks like you posted this answer to your own question a full two years later. As someone seeing a similar issue, thanks for following up! Curious, how did you figure out that Cloudflare was the cause? – ESRogs Mar 25 '22 at 21:34
  • 1
    By getting into a call with one of their engineers trying to sell enterprise package and asking them questions – Esqarrouth Mar 26 '22 at 13:58
  • A possible third mitigation is to do client-side load balancing, i.e. to open two connections so there is an increased chance to have at least one connection running to communicate over. – Konstantin Möllers Jun 01 '23 at 17:00