5

We have a simple example of target tracking autoscaling configured for an ecs containerized application based on the CPU and memory. We have 4 alarms autoconfigured by the code below (2 CPU - 1 scale up, 1 scaledown, and 2 memory, 1 scale up and 1 scale down)

We see that when the cloudwatch alarms trigger for autoscaling up, our ecs service tasks autoscales up instantaneously (on the ecs side, there are events present straight away setting the desired count upwards). However, we are observing different behaviour when the cloudwatch alarms trigger for autoscaling down:

  1. Sometimes ecs service tasks scale down straight away (scale down alarms goes off straight away and set desired count downwards event present straight away on ecs side)
  2. Sometimes ecs service tasks scales down at a delayed time e.g. 7-15 minutes later, or even a few hours later (scale down alarms goes off straight away but set desired count downwards event delayed on ecs side for 7-15 mins, or a few hours later)
  3. Sometimes ecs service tasks do not scale down at all (we saw over the weekend that scale down alarms were triggered but the ecs service tasks never scaled down over a 48 hour period and set desired count downwards event never reached ecs side)

On the cloudwatch alarm side we are observing that the alarm always goes off when expected for both scaling up and down, its on the ecs side that we think the issue resides.

The autoscaling code is as follows:

resource aws_appautoscaling_target this {
  max_capacity = 5
  min_capacity = 1
  resource_id = "service/dev/service1"
  scalable_dimension = "ecs:service:DesiredCount"
  service_namespace = "ecs"
}

resource aws_appautoscaling_policy memory {
  name               = "memory"
  policy_type        = "TargetTrackingScaling"
  resource_id        = aws_appautoscaling_target.this.resource_id
  scalable_dimension = aws_appautoscaling_target.this.scalable_dimension
  service_namespace  = aws_appautoscaling_target.this.service_namespace

  target_tracking_scaling_policy_configuration {
    predefined_metric_specification {
      predefined_metric_type = "ECSServiceAverageMemoryUtilization"
    }

    scale_in_cooldown = 60
    scale_out_cooldown = 60
    target_value       = 50
  }
}

resource aws_appautoscaling_policy cpu {
  name = "cpu"
  policy_type = "TargetTrackingScaling"
  resource_id = aws_appautoscaling_target.this.resource_id
  scalable_dimension = aws_appautoscaling_target.this.scalable_dimension
  service_namespace = aws_appautoscaling_target.this.service_namespace

  target_tracking_scaling_policy_configuration {

    predefined_metric_specification {
      predefined_metric_type = "ECSServiceAverageCPUUtilization"
    }

    scale_in_cooldown = 60
    scale_out_cooldown = 60
    target_value = 60
  }
}

Has anyone seen this behaviour i.e. that alarms in cloudwatch are going off correctly, the ecs service is always scaling up when expected but not always scaling down when expected? Are we missing something obvious here?, Help greatly appreciated

bstack
  • 2,466
  • 3
  • 25
  • 38

1 Answers1

1

Check your policy configuration. When you have multiple scaling policies, they must all be ready to scale down together.

If your goal is to scale down after inactivity, you can try playing with disabling scale down on certain policies to reduce the variables for scale down and/or raise the target utilization on certain policies. If there is activity that is intermittent, it might be a signal to a given policy that it shouldn't scale down yet. It needs sustained low activity to scale down.

Tony
  • 11
  • 1
  • is this "they must all be ready to scale down together" documented anywhere? – PragmaticProgrammer Jun 28 '23 at 17:36
  • found the source: "It will scale out the scalable target if any of the target tracking policies are ready for scale out, but will scale in only if all of the target tracking policies (with the scale-in portion enabled) are ready to scale in." - https://docs.aws.amazon.com/autoscaling/application/userguide/application-auto-scaling-target-tracking.html – PragmaticProgrammer Jun 28 '23 at 17:46