10

I've been using Autoscale to shift between 2 and 1 instances of a cloud service in a bid to reduce costs. This mostly works except that from time to time (not sure what the pattern seems to be here), the act of scaling up (1->2) causes both instances to recycle, generating a service outage for users.

Assuming nothing fancy is going on in RoleEntry in response to topology changes, why would scaling from 1->2 restart the already running instance?

Additional notes:

  • It's clear both instances are recycling by looking at the Instances tab in Management Portal. Outage can also be confirmed by hitting the public site.
  • It doesn't happen consistently but I'm not sure what the pattern is. It feels like when the 1-instance configuration has been running for multiple days, attempts to scale up recycle both. But if the 1-instance configuration has only been running for a few hours, you can scale up and down without outages.
  • The first instance always comes back much faster than the 2nd instance being introduced.
Youngjae
  • 24,352
  • 18
  • 113
  • 198
Nariman
  • 6,368
  • 1
  • 35
  • 50

3 Answers3

2

This has always been this way. When you have 1 server running and you go to 2+, the initial server is restarted. In order to have a full SLA, you need to have 2+ servers at all time.

Igorek
  • 15,716
  • 3
  • 54
  • 92
  • Yes, this should be the answer. Microsoft even warns about not having 1 role on many VMs prior to publishing your service. In fact, you can set a property to prevent deployments to only one instance. – Anthony Mason Jun 29 '16 at 20:16
  • 1
    Any link to official documentation for this? I thought the recommendation for having more than 1 instance was always due to making sure you have instances in different Fault Domains for when MS reboot instances for windows update or carry out maintenance? Where do they say that scaling from 1 instance to 2 instances causes downtime? – oatsoda Sep 16 '16 at 08:51
0

You should be able to control this behavior. In the roleEntrypoint, there's an event you can trap for, RoleEnvironmentChanging.

A shell of some code to put into your solution will look like...

RoleEnvironment.Changing += RoleEnvironmentChanging;

private void RoleEnvironmentChanging(object sender, RoleEnvironmentChangingEventArgs e)
{
}

RoleEnvironment.Changed += RoleEnvironmentChanged;

private void RoleEnvironmentChanged(object sender, RoleEnvironmentChangedEventArgs e)
{
}

Then, inside the RoleEnvironmentChanged method, we can detect what the change is and tell Azure if we want to restart or not.

if ((e.Changes.Any(change => change is RoleEnvironmentConfigurationSettingChange)))
{
    e.Cancel = true; // don't recycle the role
}
Martin Liversage
  • 104,481
  • 22
  • 209
  • 256
BrentDaCodeMonkey
  • 5,493
  • 20
  • 18
  • I believe that if you have 1 server, it is still getting restarted & yanked out of the LB, regardless of handling of Changed event. This is based on numerous comments from AzureWatch customers – Igorek Mar 07 '14 at 22:13
  • Thanks Igor, I'll try to carve some time out this weekend to verify. :) – BrentDaCodeMonkey Mar 08 '14 at 01:35
  • I will try to put together some details when I get a chance, but essentially Igor is correct. What happens is that when the 2nd instance reaches the Ready state then the first one is taken out of LB rotation in order to process the topology change. During this time if instance 2 has a long w3wp warmup time then clients will timeout or have really long running requests. – kwill Mar 08 '14 at 05:12
  • That makes sense and mirrors issues I've seen on the IaaS side with availability sets. But what I'm still wondering about is the restart Igor's seeing on single instances. – BrentDaCodeMonkey Mar 09 '14 at 15:51
0

Nariman, see my comment on Brent's post for some information about what is happening. You should be able to resolve this with the following code:

public class WebRole : RoleEntryPoint
{
    public override bool OnStart()
    {
        // For information on handling configuration changes
        // see the MSDN topic at http://go.microsoft.com/fwlink/?LinkId=166357.
        IPHostEntry ipEntry = Dns.GetHostEntry(Dns.GetHostName());
        string ip = null;
        foreach (IPAddress ipaddress in ipEntry.AddressList)
        {
            if (ipaddress.AddressFamily.ToString() == "InterNetwork")
            {
                ip = ipaddress.ToString();
            }
        }

        string urlToPing = "http://" + ip;
        HttpWebRequest req = HttpWebRequest.Create(urlToPing) as HttpWebRequest;
        WebResponse resp = req.GetResponse();
        return base.OnStart();
    }
}
kwill
  • 10,867
  • 1
  • 28
  • 26
  • This makes a request to itself to bring up w3wp so an end-user request isn't necessary? I would much prefer that OnStart not be called on this already running instance. – Nariman Mar 08 '14 at 13:03
  • This code will only run on instance 2 - instance 1 is already running and doesn't recycle. This code, when run on instance 2, will delay instance 2 from reaching the Ready state until w3wp is ready. This means that when instance 1 is temporarily removed from load balancer rotation then instance 2 is ready to receive traffic. – kwill Mar 08 '14 at 17:08
  • I see instance 1 removed from the LB (with topology changed event firing) long before instance 2 is up and in a ready state. I think the topology change event happens even before the 2nd VM is created – Nariman Mar 18 '14 at 19:34