Use Application Autoscaling Group with ELB Healthchecks

Question

Has anybody succeeded in using an Application Autoscaling group with an ELB Health check. It replaces the instances over and over. Is there a way to prevent that?

My template looks like that:

Resources:
  ECSAutoScalingGroup:
    Type: AWS::AutoScaling::AutoScalingGroup
    Properties:
      AvailabilityZones:
        - Fn::Select:
          - '0'
          - Fn::GetAZs:
            Ref: AWS::Region
        - Fn::Select:
         - '1'
         - Fn::GetAZs:
             Ref: AWS::Region
      - Fn::Select:
         - '2'
         - Fn::GetAZs:
             Ref: AWS::Region
     VPCZoneIdentifier:
       - Fn::ImportValue: !Sub ${EnvironmentName}-PrivateEC2Subnet1
       - Fn::ImportValue: !Sub ${EnvironmentName}-PrivateEC2Subnet2
       - Fn::ImportValue: !Sub ${EnvironmentName}-PrivateEC2Subnet3
    HealthCheckGracePeriod: !Ref ASGHealthCheckGracePeriod
    HealthCheckType: !Ref ASGHealthCheckType
    LaunchTemplate:
      LaunchTemplateId: !Ref ECSLaunchTemplate
      Version: 1
    MetricsCollection:
      - Granularity: 1Minute
    ServiceLinkedRoleARN:
     !Sub arn:aws:iam::${AWS::AccountId}:role/aws-service-role/autoscaling.amazonaws.com/AWSServiceRoleForAutoScaling
    DesiredCapacity: !Ref ASGDesiredCapacity
    MinSize: !Ref ASGMinSize
    MaxSize: !Ref ASGMaxSize
    TargetGroupARNs:
    - Fn::ImportValue: !Sub ${EnvironmentName}-WebTGARN
      Fn::ImportValue: !Sub ${EnvironmentName}-DataTGARN
      Fn::ImportValue: !Sub ${EnvironmentName}-GeneratorTGARN
    TerminationPolicies:
    - OldestInstance

the Launchtemplate looks like this:

ECSLaunchTemplate:
  Type: AWS::EC2::LaunchTemplate
  Properties:
    LaunchTemplateName: ECSLaunchtemplate
    LaunchTemplateData:
      ImageId: !FindInMap [AWSRegionToAMI, !Ref "AWS::Region", AMI]
      InstanceType: !Ref InstanceType
      SecurityGroupIds:
      - Fn::ImportValue: !Sub ${EnvironmentName}-ECSInstancesSecurityGroupID
    IamInstanceProfile:
        Arn:
          Fn::ImportValue:
            !Sub ${EnvironmentName}-ecsInstanceProfileARN
    Monitoring:
      Enabled: true
    CreditSpecification:
      CpuCredits: standard
    TagSpecifications:
     - ResourceType: instance
       Tags:
       - Key: "keyname1"
         Value: "value1"
    KeyName:
      Fn::ImportValue:
        !Sub ${EnvironmentName}-ECSKeyPairName
    UserData:
      "Fn::Base64": !Sub
        - |
          #!/bin/bash
          yum update -y
          yum install -y https://s3.amazonaws.com/ec2-downloads-windows/SSMAgent/latest/linux_amd64/amazon-ssm-agent.rpm
          yum update -y aws-cfn-bootstrap hibagent
          /opt/aws/bin/cfn-init -v --region ${AWS::Region} --stack ${AWS::StackName} --resource ECSLaunchTemplate --region ${AWS::Region}
          /opt/aws/bin/cfn-signal -e $? --region ${AWS::Region} --stack ${AWS::StackName} --resource ECSAutoScalingGroup
          /usr/bin/enable-ec2-spot-hibernation
          echo ECS_CLUSTER=${ECSCluster} >> /etc/ecs/ecs.config
         PATH=$PATH:/usr/local/bin
        - ECSCluster:
            Fn::ImportValue:
              !Sub ${EnvironmentName}-ECSClusterName

the Load balancer config looks like this:

ApplicationLoadBalancerInternet:
   Type: AWS::ElasticLoadBalancingV2::LoadBalancer
   Properties:
     Name: !Sub ${EnvironmentName}-${Project}-ALB-Internet
     IpAddressType: !Ref ELBIpAddressType
     Type: !Ref ELBType
     Scheme: internet-facing
     Subnets:
     - Fn::ImportValue:
        !Sub ${EnvironmentName}-PublicSubnet1
     - Fn::ImportValue:
        !Sub ${EnvironmentName}-PublicSubnet2
     - Fn::ImportValue:
        !Sub ${EnvironmentName}-PublicSubnet3
     SecurityGroups:
     - Fn::ImportValue:
        !Sub ${EnvironmentName}-ALBInternetSecurityGroupID

As said, its working fine with EC2 Healthchecks but when I switch to ELB Healthchecks the instances are being drained and the ASG spins up a new instance.

Merci A

is there any problem with replacing the instances? if you want you could use container so it is only replacing the container image — sin, Oct 13 '18 at 02:27
Yes, but still it makes no sense. I want to know what's wrong with the instances since if they are being replaced every 5 minutes something must be wrong. — aerioeus, Oct 14 '18 at 18:37
what kind if application did u use on the application autoscaling? [Application Autoscaling - Resource Type](https://docs.aws.amazon.com/en_us/autoscaling/application/APIReference/API_RegisterScalableTarget.html#autoscaling-RegisterScalableTarget-request-ResourceId) — sin, Oct 15 '18 at 09:47
You should provide your template for us to help. My guess is that your ELB is not able to contact your instances for health check, therefore it keeps marking them as unhealthy and they are replaced. 5 minutes sounds like the default connection draining of 300 seconds. — tyron, Oct 17 '18 at 18:36
@tyron, I have added the templates, merci for dealing with the topic. A — aerioeus, Oct 19 '18 at 09:52
Can you double check that `!Sub ${EnvironmentName}-ECSInstancesSecurityGroupID` has Ingress permissions for `${EnvironmentName}-ALBInternetSecurityGroupID`? Also, side note, `TargetGroupARNs` on `ECSAutoScalingGroup` is missing the dashes on each line (it is supposed to be a list). — tyron, Oct 19 '18 at 13:21
@tyron thank you, the Ingress Rules are as you surmised they should, I believe my issue lies somewhere else, I'm mounting an EFS into the Container and I believe this script is not clean. but I believe its better to open a new question for that, right? — aerioeus, Oct 19 '18 at 15:56

score 0 · Answer 1 · answered Dec 30 '18 at 16:44

I would troubleshoot it like this:
1. Delete this stack.
2. Edit your template and change the ASG health-check type to ELB (for now).
3. Create new stack either from CLI or console. I recommend CLI since you might have to recreate it and it's far simpler/quicker than console. The most important step is to enable "Disable-Rollback" feature when the stack fails, otherwise, you wont be able to find out the reason of failure
4. I believe you will also be creating some IAM resources as a part of this template, so an example CLI command would be this for your quick reference: aws cloudformation create-stack --stack-name Name-of-your-stack --template-body file://template.json --tags Key=Name,Value=Your_Tag_Value --profile default --region region --capabilities CAPABILITY_NAMED_IAM --disable-rollback yes
5. For more information on the requirement of CAPABILITY_NAMED_IAM, see this SO answer.
6. Now, when you create the stack, it's still going to fail, but now we can troubleshoot it. The reason we kept the healthcheck type to ELB in step 2 is that we actually want the ASG to replace the instances on failed healthchecks and we can find out the reason in the ASG's "Activity History tab" from the console.
7. Chances are high, that you will see a message far more meaningful than, that was returned by CloudFormation.
8. Now that you have that error message, change the healthcheck type of ASG from the console to EC2, because we do not want the ASG to start of loop of "launch and terminate" for EC2 instances.
9. Now, login to your EC2 instance and look for the access logs, for the hits from your ELB healthcheck. In httpd, a successful healthcheck gets an HTTP 408.
10. Also please note that if the ELB healtcheck type is TCP:80 then, there isnt any port conflict on your server and if you have selected HTTP:80, then you have specified a path/file as well as your ping target.
11. Since your script has some user-data as well, please also review /var/log/cfn-init.log and other entries for any error message. A simple option would be, grep error /var/log/*
12. Now, at this point, you just have to make sure you get the ELB healthcheck successful and the instance "in-service" behind the ELB and the most important step is to document all the troubleshooting steps because you never know, which step out of many you tried actually fixed this healthcheck.
13. Once you are able to find the cause, just put it in the template and you should be good to go. I have seen many templates going wrong at Step 8.
14. Also, do not miss to change the ASG healthecheck to ELB, once again.

Use Application Autoscaling Group with ELB Healthchecks

1 Answers1