We have a simple example of target tracking autoscaling configured for an ecs containerized application based on the CPU and memory. We have 4 alarms autoconfigured by the code below (2 CPU - 1 scale up, 1 scaledown, and 2 memory, 1 scale up and 1 scale down)
We see that when the cloudwatch alarms trigger for autoscaling up, our ecs service tasks autoscales up instantaneously (on the ecs side, there are events present straight away setting the desired count upwards). However, we are observing different behaviour when the cloudwatch alarms trigger for autoscaling down:
- Sometimes ecs service tasks scale down straight away (scale down alarms goes off straight away and set desired count downwards event present straight away on ecs side)
- Sometimes ecs service tasks scales down at a delayed time e.g. 7-15 minutes later, or even a few hours later (scale down alarms goes off straight away but set desired count downwards event delayed on ecs side for 7-15 mins, or a few hours later)
- Sometimes ecs service tasks do not scale down at all (we saw over the weekend that scale down alarms were triggered but the ecs service tasks never scaled down over a 48 hour period and set desired count downwards event never reached ecs side)
On the cloudwatch alarm side we are observing that the alarm always goes off when expected for both scaling up and down, its on the ecs side that we think the issue resides.
The autoscaling code is as follows:
resource aws_appautoscaling_target this {
max_capacity = 5
min_capacity = 1
resource_id = "service/dev/service1"
scalable_dimension = "ecs:service:DesiredCount"
service_namespace = "ecs"
}
resource aws_appautoscaling_policy memory {
name = "memory"
policy_type = "TargetTrackingScaling"
resource_id = aws_appautoscaling_target.this.resource_id
scalable_dimension = aws_appautoscaling_target.this.scalable_dimension
service_namespace = aws_appautoscaling_target.this.service_namespace
target_tracking_scaling_policy_configuration {
predefined_metric_specification {
predefined_metric_type = "ECSServiceAverageMemoryUtilization"
}
scale_in_cooldown = 60
scale_out_cooldown = 60
target_value = 50
}
}
resource aws_appautoscaling_policy cpu {
name = "cpu"
policy_type = "TargetTrackingScaling"
resource_id = aws_appautoscaling_target.this.resource_id
scalable_dimension = aws_appautoscaling_target.this.scalable_dimension
service_namespace = aws_appautoscaling_target.this.service_namespace
target_tracking_scaling_policy_configuration {
predefined_metric_specification {
predefined_metric_type = "ECSServiceAverageCPUUtilization"
}
scale_in_cooldown = 60
scale_out_cooldown = 60
target_value = 60
}
}
Has anyone seen this behaviour i.e. that alarms in cloudwatch are going off correctly, the ecs service is always scaling up when expected but not always scaling down when expected? Are we missing something obvious here?, Help greatly appreciated