I’m investigating a scenario with a live dashboard (Angular web app) that is refreshed every 5 seconds (polling). The API is sitting behind Azure Traffic Manager which will fail over to a second region in the event of a failure in the primary region. Keep in mind, Azure Traffic Manager works at the DNS level.
The problem I am facing is that the browser maintains a persistent connection to the primary region even after the Traffic Manager has failed over. The requests initially fail with 503s, but then continue to fail with 502s. The DNS lookup is never performed again as the requests occur more frequently than the keep-alive timeout. This causes the browser to continue to make requests to the failed region.
Is there anyway to explicitly kill the connection to force a DNS lookup? The only way I’ve found so far is to stop making requests for 2 minutes, or to close and reopen the browser. Neither is an acceptable solution for a dashboard that is supposed to be hands off and always fresh.
What’s interesting is after getting the browser to fail over to the secondary region, if I restart the primary region the browser will automatically switch back to the primary region after about a minute. This tells me the connection is respecting the DNS TTL when the service is functioning properly, but not when the server is unavailable. This makes no sense to me why the browser would lock onto a single IP forever when it’s not found.
Is there something I am missing about implementing georedundant failover with Traffic Manager for a web application? It seems very odd to me that the user would have to stop making requests for 2 minutes in any scenario before the browser would renegotiate the IP to the failed over server. Is it expected to turn of keep-alive to truly support near instant failover?
Here's a diagram that describes this scenario:
Diagram