0

I have an Azure website (website.mycompany.com) that uses a WCF service for some data. The WCF Service sits behind an Azure Traffic Manager (service.mycompany.com) running in "priority mode", with 2 instances of the service for failover handling. With priority mode, the primary always serves up the data first, unless it's unavailable. If unavailable, the 2nd instance will reply.. and so on down the line.

We've had a few instances recently where the primary endpoint for service.mycompany.com was offline. For "partnerships" who point to service.mycompany.com, they detected the switch and all was fine. Lately however, our own site (website.mycompany.com) does NOT detect the traffic manager switch, and the website has errors since the service fails to reply.

Our failover endpoint in these instances is up, and in the past the Azure website detected the switch, it's only recently we've encountered this issue. Has anyone experienced similar issues? Are there perhaps any DNS changes that we need to tweak in our Azure Website to help it detect TTL's?

ewitkows
  • 3,528
  • 3
  • 40
  • 62

2 Answers2

0

Has anyone experienced similar issues?

Do you mean the traffic manager can't switch to another endpoint immediately?

Traffic manager works at the DNS level, here are the reasons why traffic manager can't switch immediately:

  1. The duration of the cache is determined by the 'time-to-live' (TTL) property of each DNS record. Shorter values result in faster cache expiry and thus more round-trips to the Traffic Manager name servers. Longer values mean that it can take longer to direct traffic away from a failed endpoint.

  2. The traffic manager endpoint monitor effects the response time. More information about how azure traffic manager works, please refer to the link.
    The following timeline is a detailed description of the monitoring process. enter image description here

  3. Also we can check traffic manager profile using nslookup and ipconfig in windows. About how to vertify traffic Manager settings, please refer to the link.

By the way, because traffic manager works at the DNS level, it cannot influence existing connections to any endpoint. When it directs traffic between endpoints (either by changed profile settings, or during failover or failback), Traffic Manager directs new connections to available endpoints. However, other endpoints might continue to receive traffic via existing connections until those sessions are terminated. To enable traffic to drain from existing connections, applications should limit the session duration used with each endpoint.

Jason Ye
  • 13,710
  • 2
  • 16
  • 25
0

I'm going to refer you to my answer here because while the situation isn't exactly the same, it seems like it could have the same solution. To summarize, I find it likely that you have a connection left open to the down service that isn't being properly closed. This connection is independent of TTL, which only deals with DNS caching, and as such bypasses Traffic Manager completely.

dornadigital
  • 167
  • 13