I can setup the following through the console and I have it as a cloudformation template as well:
- a scalable target associated to my ALB,
- a CPU target tracking scaling policy,
- a ALBRequestCountPerTarget target tracking policy.
This all work really well. Creating the policies in my Cloudformation template also takes care of creating the associated scale out and scale in alarms.
The problem: the autocreated alarms only trigger after 3 alarms went off in previous 3 periods of 60 seconds. So, if a sudden load comes in, it will take 3 minutes for the ECS cluster's service to scale out. This is much too long for me. I want it to scale as fast as possible. And, from the documentation, it seems like the smallest period for the ALB RequestCountPerTarget is 60 seconds:
Only a period greater than 60s is supported for metrics in the "AWS/" namespaces
Manual solution: right now, I can go to the console, in the cloudwatch service, find the HIGH and LOW alarms created for me, and edit the HIGH alarm (the one that triggers a scale out). So I can set the alarm's evaluation "period" to 60 seconds, "DatapointsToAlarm" to 1 (as soon as the alarm goes off, trigger the scale out action), "EvaluationPeriods" to 1 (take into account the previous 60s period only), and "Threshold" to 500 (if there are more than 500 requests on my ALB in the past 60 seconds, add capacity=scale out).
To test, I use JMeter and send a ton of requests and I can see that the alarm goes off in a minute or so and my ECS service adjusts the desired count of running tasks. This all works very well.
But now, we're all supposed to write infrastructure as code (IaC), right? So, I want the above console tweaks to be included in my CloudFormation templates. And that's where the problems occur. What I've done:
- I added two new alarms in my Cloudformation template: one for HIGH (scale out) and one for LOW (scale in),
- the two new alarms point to the existing scaling policy.
What happens:
- the alarms DO go into ALARM state within 60 seconds,
- the policy tries to run the action (scale out) but there is an error:
Failed to execute action
arn:aws:autoscaling:us-east-1:MY-ACCOUNT-ID:scalingPolicy:51f0a780-28d5-4005-9681-84244912954d:resource/ecs/service/my-ecs-cluster/my-ecs-service:policyName/alb-requests-per-target-per-minute. Received error: ""
I have no idea what that means other than to tell me: alarm was raised correctly, we tried to act on it and scale out, but it failed (and no error provided!).
I tried to compare this error with the other successful actions [the actions from the alarms automatically created by AWS when it creates the autoscaling policy=the ones that trigger at t=3 minutes], and the only difference I see is that the ARN to the action in the error message seems to be missing the "createdBy" that seems to be appended to the action ARNs when the "default" alarms (the ones automatically provisioned when cloudformation creates the autoscaling policy) trigger the action:
Successfully executed action arn:aws:autoscaling:us-east-1:MY-ACCOUNT-ID:scalingPolicy:51f0a780-28d5-4005-9681-84244912954d:resource/ecs/service/my-ecs-cluster-cce/my-ecs-service:policyName/tpg-cce-cpu-target-tracking-scaling-policy:createdBy/59b3e5ac-81ae-490f-8ecb-00241506a15e
Successfully executed action arn:aws:autoscaling:us-east-1:MY-ACCOUNT-ID:scalingPolicy:51f0a780-28d5-4005-9681-84244912954d:resource/ecs/service/my-ecs-cluster-cce/my-ecs-service:policyName/alb-requests-per-target-per-minute:createdBy/ffacb0ac-2456-4751-b9c0-b909c66e9868
Note the difference above (the missing createdBy in the policy ARN where the action is triggered by my custom alarms). But I have no idea how to get that because, in CloudFormation, I Fn::Ref to the policy ARN, there is no mention of appending some sort of "createdBy" at the end of the policy ARN (note that this might not be the issue at all, I'm just listing what I found so far and that is the only difference I found so far = this might be a red herring/false lead).
Another clue, maybe, is that, when I go to the AWS console in Cloudwatch to look at the alarms:
- I CAN edit the Cloudwatch HIGH alarm that AWS created automatically when I create my policy,
- I CANNOT edit my custom "scale out" alarm (the one at the bottom of my CloudFormation template below). The error in the console when I tried to edit my custom alarm is:
Cannot edit myCloudFormationStack-ALBRequestsScaleOutAlarm-E4VY9ZOJ5DOF as it is an Auto Scaling alarm with target tracking scaling policies.
Yet another difference between my alarm and the one automatically created with the policy: when I look in the AWS console, in CloudWatch->Alarms, and I look at the alarms details, the "Actions" section looks different. For the automatically provisioned alarm, I see this:
When in alarm, execute this action arn:aws:autoscaling:us-east-1:MYACCOUNTID:scalingPolicy:51f0a780-28d5-4005-9681-84244912954d:resource/ecs/service/my-ecs-cluster-cce/my-service:policyName/alb-requests-per-target-per-minute:createdBy/ffacb0ac-2456-4751-b9c0-b909c66e9868
But for my own alarm (defined in the CloudFormation template below), I see this in the alarm details (Actions):
When in alarm, use policy alb-requests-per-target-per-minute (Maintain the metric ALBRequestCountPerTarget at the target value of 1000.)
Here is my full CloudFormation template:
AWSTemplateFormatVersion: '2010-09-09'
Description: ECS task definition, service, and hooks it up to the ALB via a Target Group
# IMPORTANT: this needs the first Cloudformation layers in place (see the imports below)
Parameters:
ContainerImageIdParam:
Description: The ECR container image ID and tag to deploy
Type: String
Default: MYACCOUNTID.dkr.ecr.us-east-1.amazonaws.com/myapp:v10
JDBCUrlParam:
Description: The JDBC URL to the RDS database (use the Route53 DNS entry to your database, and NOT the AWS URL)
Type: String
Default: jdbc-secretsmanager:mysql://mysql.MYPIERREDNS.org:3306/MYDATABASE?useSSL=false&allowPublicKeyRetrieval=true&serverTimezone=UTC
Resources:
Task:
Type: AWS::ECS::TaskDefinition
Properties:
Family: myapp
Cpu: 512
Memory: 1024
NetworkMode: awsvpc
RequiresCompatibilities:
- FARGATE
ExecutionRoleArn: !ImportValue ECSTaskExecutionRole
taskRoleArn: !ImportValue ECSTaskRole
ContainerDefinitions:
- Name: myapp-container
Image: !Ref ContainerImageIdParam
Cpu: 512
Memory: 1024
environment:
- name: JDBC_DB_URL
value: !Ref JDBCUrlParam
- name: JDBC_DB_DRIVER_CLASS
value: com.amazonaws.secretsmanager.sql.AWSSecretsManagerMySQLDriver
- name: JDBC_DB_USERNAME
value: dev/myapp/mysql
- name: JDBC_DB_PASSWORD
value: notUsedButECSWillErrorIfMissing
- name: DB_NUM_THREADS
value: 10
PortMappings:
- ContainerPort: 9000
Protocol: tcp
LogConfiguration:
LogDriver: awslogs
Options:
awslogs-group: /ecs/myapp
awslogs-region: !Ref AWS::Region
awslogs-stream-prefix: myapp-app
Service:
Type: AWS::ECS::Service
DependsOn: ListenerRule
Properties:
ServiceName: myapp-service # todo if we remove the name, one will be automatically be generated
TaskDefinition: !Ref Task
Cluster: !ImportValue ECSCluster
LaunchType: FARGATE
DesiredCount: 1 # set this to 0 if cloudformation has issues creating this stack (otherwise takes 3 hours and then fails/timeouts)
DeploymentConfiguration:
MaximumPercent: 200
MinimumHealthyPercent: 0
HealthCheckGracePeriodSeconds: 30
NetworkConfiguration:
AwsvpcConfiguration:
AssignPublicIp: ENABLED
Subnets:
- !ImportValue PrivateSubnet1
- !ImportValue PrivateSubnet2
SecurityGroups:
- !ImportValue ECSServiceSecurityGroup
LoadBalancers:
- ContainerName: myapp-container
ContainerPort: 9000
TargetGroupArn: !Ref TargetGroup
TargetGroup:
Type: AWS::ElasticLoadBalancingV2::TargetGroup
Properties:
Name: myapp-tg
VpcId: !ImportValue VPC
Port: 9000
Protocol: HTTP
Matcher:
HttpCode: 200-299
HealthCheckIntervalSeconds: 30
HealthCheckPath: /myapp-svc/index.html
HealthCheckProtocol: HTTP
HealthCheckTimeoutSeconds: 10
HealthyThresholdCount: 2
UnhealthyThresholdCount: 6
TargetType: ip
TargetGroupAttributes:
- Key: deregistration_delay.timeout_seconds
Value: 30
ListenerRule:
Type: AWS::ElasticLoadBalancingV2::ListenerRule
Properties:
ListenerArn: !ImportValue LoadBalancerListenerHTTPS
Priority: 20
Conditions:
- Field: path-pattern
Values:
- /myapp-svc/*
Actions:
- TargetGroupArn: !Ref TargetGroup
Type: forward
ECSAutoScalingTarget:
Type: AWS::ApplicationAutoScaling::ScalableTarget
Properties:
MaxCapacity: 6
MinCapacity: 1
ResourceId: !Join ["/", [service, !ImportValue ECSCluster, !GetAtt Service.Name]] # service/clusterName/serviceName = service/ecs-cluster-myapp/myapp-service
RoleARN: !Sub 'arn:aws:iam::${AWS::AccountId}:role/aws-service-role/ecs:application-autoscaling:amazonaws:com/AWSServiceRoleForApplicationAutoScaling_ECSService'
ScalableDimension: ecs:service:DesiredCount
ServiceNamespace: ecs
CPUUtilizationAutoScalingPolicy:
Type: AWS::ApplicationAutoScaling::ScalingPolicy
Properties:
PolicyName: myapp-cpu-target-tracking-scaling-policy
PolicyType: TargetTrackingScaling
ScalingTargetId: !Ref ECSAutoScalingTarget
TargetTrackingScalingPolicyConfiguration:
DisableScaleIn: true # disable scale in for this policy to give ALBRequestPolicy the priority on scale in decisions
PredefinedMetricSpecification:
PredefinedMetricType: ECSServiceAverageCPUUtilization
ScaleInCooldown: 300
ScaleOutCooldown: 30
TargetValue: 50 # Average 50% CPU utilization
ServiceScalingPolicyALB:
Type: AWS::ApplicationAutoScaling::ScalingPolicy
Properties:
PolicyName: alb-requests-per-target-per-minute
PolicyType: TargetTrackingScaling
ScalingTargetId: !Ref ECSAutoScalingTarget
TargetTrackingScalingPolicyConfiguration:
TargetValue: 1000
ScaleInCooldown: 300
ScaleOutCooldown: 30
PredefinedMetricSpecification:
PredefinedMetricType: ALBRequestCountPerTarget
ResourceLabel: !Join
- '/'
- - !ImportValue ECSLoadBalancerFullName
- !GetAtt TargetGroup.TargetGroupFullName
# NOTE: the ALB RequestCountPerTarget metric alarms are automatically
# created when we use that policy. But if we want a different evaluation period,
# we need to define our own alarms. So, the new scale IN/OUT alarms are included below.
# SCALE OUT ALARM: if the total (SUM) of ALB requests per target is ABOVE the
# threshold a certain number of times in the past period, THEN send "scale out" alarm.
ALBRequestsScaleOutAlarm:
Type: AWS::CloudWatch::Alarm
Properties:
MetricName: RequestCountPerTarget
Namespace: AWS/ApplicationELB # Only a period greater than 60s is supported for metrics in the "AWS/" namespaces
ActionsEnabled: true
AlarmActions:
- !Ref ServiceScalingPolicyALB
# OKActions: []
# InsufficientDataActions: []
Statistic: Sum
Dimensions:
- Name: LoadBalancer
Value: !ImportValue ECSLoadBalancerFullName
- Name: TargetGroup
Value: !GetAtt TargetGroup.TargetGroupFullName
Period: 60 # evaluation period (in seconds) = 1 datapoint
EvaluationPeriods: 1 # number of previous periods to take into account
DatapointsToAlarm: 1 # number of datapoints above threshold needed to generate an alarm
Threshold: 1000 # alarm threshold: more than 1000 requests
Unit: None
ComparisonOperator: GreaterThanThreshold
# SCALE IN ALARM: if the total (SUM) of ALB requests per target is BELOW the
# threshold a certain number of times in the past period, THEN send "scale in" alarm.
ALBRequestsScaleInAlarm:
Type: AWS::CloudWatch::Alarm
Properties:
MetricName: RequestCountPerTarget
Namespace: AWS/ApplicationELB
Statistic: Sum
Period: 60 # evaluation period (in seconds) = 1 datapoint
EvaluationPeriods: 5 # number of previous periods to take into account
DatapointsToAlarm: 1 # number of datapoints above threshold needed to generate an alarm
Threshold: 500 # alarm threshold: less than 500 requests
Unit: None
AlarmActions:
- !Ref ServiceScalingPolicyALB
OKActions:
- !Ref ServiceScalingPolicyALB
Dimensions:
- Name: LoadBalancer
Value: !ImportValue ECSLoadBalancerFullName
- Name: TargetGroup
Value: !GetAtt TargetGroup.TargetGroupFullName
ComparisonOperator: LessThanThreshold
Q1: what I am doing wrong? How do I specify an ACTION in CloudFormation so that my alarm triggers the same action as the one triggered by the autoprovisioned AWS alarms (the ones that get automatically created when I create my policy)?
Q2: is there a way to look at the ACTIONs in the AWS console? I'm guessing that AWS hides these things under the covers (maybe lambda or other?).
Q3: does anyone have another way to do this? Maybe with step scaling? I'm willing to trigger below 60 seconds as well so maybe I should move away from target tracking?
If anyone has a working sample of a CloudFormation template that triggers in one minute or less based on the number of requests to the ALB, that would definitely be AWESOME :) I put that in a separate question (how to trigger in less than one minute): ECS Fargate autoscaling more rapidly?