HPA scale down not happening properly

Question

I have created HPA for my deployment, it’s working fine for scaling up to max replicas (6 in my case), when load reduces its scale down to 5 but it supposed to come to my original state of replicas (1 in my case) as load becomes normal . I have verified after 30-40 mins still my application have 5 replicas .. It supposed to be 1 replica.

[ec2-user@ip-192-168-x-x ~]$ kubectl describe hpa admin-dev -n dev

Name: admin-dev
Namespace: dev
Labels: <none>
Annotations: <none>
CreationTimestamp: Thu, 24 Oct 2019 07:36:32 +0000
Reference: Deployment/admin-dev
Metrics: ( current / target )
resource memory on pods (as a percentage of request): 49% (1285662037333m) / 60%
Min replicas: 1
Max replicas: 10
Deployment pods: 3 current / 3 desired
Conditions:
  Type           Status Reason             Message
  ----           ------ ------             -------
  AbleToScale    True   ReadyForNewScale   recommended size matches current size
  ScalingActive  True   ValidMetricFound   the HPA was able to successfully calculate a replica count from memory resource utilization (percentage of request)
  ScalingLimited False  DesiredWithinRange the desired count is within the acceptable range 

Events:
  Type   Reason            Age   From                      Message
  ----   ------            ----  ----                      -------
  Normal SuccessfulRescale 13m   horizontal-pod-autoscaler New size: 2; reason: memory resource utilization (percentage of request) above target
  Normal SuccessfulRescale 5m27s horizontal-pod-autoscaler New size: 3; reason: memory resource utilization (percentage of request) above target

but what's the percentage of CPU load? Is it below threshold? — suren, Oct 24 '19 at 06:37
@LakshmiReddy could you please paste the output of `kubectl describe hpa ` ? — vesna, Oct 24 '19 at 06:54
@vesna , [ec2-user@ip-192-168-x-x ~]$ kubectl describe hpa admin-dev -n dev Name: admin-dev Namespace: dev Labels: Annotations: CreationTimestamp: Thu, 24 Oct 2019 07:36:32 +0000 Reference: Deployment/admin-dev Metrics: ( current / target ) resource memory on pods (as a percentage of request): 49% (1285662037333m) / 60% Min replicas: 1 Max replicas: 10 Deployment pods: 3 current / 3 desired — Lakshmi Reddy, Oct 24 '19 at 07:59
@vesna, Conditions: Type Status Reason Message ---- ------ ------ ------- AbleToScale True ReadyForNewScale recommended size matches current size ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from memory resource utilization (percentage of request) ScalingLimited False DesiredWithinRange the desired count is within the acceptable range — Lakshmi Reddy, Oct 24 '19 at 07:59
@vensa, Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal SuccessfulRescale 13m horizontal-pod-autoscaler New size: 2; reason: memory resource utilization (percentage of request) above target Normal SuccessfulRescale 5m27s horizontal-pod-autoscaler New size: 3; reason: memory resource utilization (percentage of request) above target — Lakshmi Reddy, Oct 24 '19 at 07:59
the output you posted does not seem to correspond to the question you asked. — Markus Dresch, Oct 25 '19 at 07:09

weibeld · Answer 1 · 2019-10-26T14:09:18.890

When the load decreases, the HPA intentionally waits a certain amount of time before scaling the app down. This is known as the cooldown delay and helps that the app is scaled up and down too frequently. The result of this is that for a certain time the app runs at the previous high replica count even though the metric value is way below the target. This may look like the HPA doesn't respond to the decreased load, but it eventually will.

However, the default duration of the cooldown delay is 5 minutes. So, if after 30-40 minutes the app still hasn't been scaled down, it's strange. Unless the cooldown delay has been set to something else with the --horizontal-pod-autoscaler-downscale-stabilization flag of the controller manager.

In the output that you posted the metric value is 49% with a target of 60% and the current replica count is 3. This seems actually not too bad.

An issue might be that you're using the memory utilisation as a metric, which is not a good autoscaling metric.

An autoscaling metric should linearly respond to the current load across the replicas of the app. If the number of replicas is doubled, the metric value should halve, and if the number of replicas is halved, the metric value should double. The memory utilisation in most cases doesn't show this behaviour. For example, if each replica uses a fixed amount of memory, then the average memory utilisation across the replicas stays roughly the same regardless of how many replicas were added or removed. The CPU utilisation generally works much better in this regard.

While scaling down, does Kubernetes attempt to remove a pod with the lowest CPU utilization in order to avoid disrupting ongoing work done by pods with high CPU usage? or is it purely a random selection? — user482594, Jun 22 '21 at 03:40
The scaling by the HPA is done through the scale subresource API (see https://github.com/kubernetes/community/blob/master/contributors/design-proposals/autoscaling/horizontal-pod-autoscaler.md#scale-subresource). You would need to look in the implementation of this API. I'm not sure if there's a specific logic selecting which Pod to remove, but I guess it's just random. — weibeld, Jun 22 '21 at 10:11

score 0 · Answer 2 · answered Oct 25 '19 at 08:23

In this case Horizontal Pod Autoscaler is working as designed.

Autoscaler can be configured to use one or more metrics.

Autoscaling based on a single metric - sums up the metrics values of all the pods, divides that by the target value set on the HorizontalPodAutoscaler resource, and then rounds it up to the next-larger integer.

desired_replicas = sum(utilization) / desired_utilization.

Example: When it's configured to scale considering CPU. If target is set to 30% and CPU usage is 97%: 97%/30%=3.23 and HPA will round it to 4 (next larger integer).

Autoscaling based on multiple pod metrics - calculates the replica count for each metric individually and then takes the highest value.

Example: if three pods are required to achieve the target CPU usage, and two pods are required to achieve the target memory usage, the Autoscaler will scale to three pods - highest number needed to meet the target.

Autoscaling on custom metrics - allows you to scale up/down based on non-resource metric types, for example scaling your frontend application based on Queries-Per-Second.

I hope it helps.

Thanks for your detailed explanation. my case is bit different here like i have memory requested for 2000 Mi and 1 replica in deployment file, and HPA configured for 80% utilisation so first time app launches it showing HPA 46% usage with 1 replica intially as its configured in Deployment file, suddenly when memory spike happens HPA scale up works fine it launches other pods so slowly my load reduces and memory becomes 45%, Ideally it should be 1 since there is no load on that but still my replicas are 2. I want HPA to bring it down to 1. Any suggestions ! — Lakshmi Reddy, Oct 28 '19 at 07:28

score 0 · Answer 3 · answered Mar 23 '22 at 01:59

i answered this on github: https://github.com/kubernetes/kubernetes/issues/78761#issuecomment-1075814510

heres a summary: the problem is in the calculation method that decides if it should scale down or up, the equation when scaling down works when the change in utilization due to load difference is big, usually with cpu ( e.g. 100m - 500m <=> 20% - 100%), but it fails when the change in utilization is small, usually with memory (e.g. 160Mi - 200Mi <=> 80% - 100%) for now its better to stick to CPU metric and make sure currentMetricValue at idle is at most half desiredMetricValue. you can apply this for both metrics: currentMetricValue * 2 =< desiredMetricValue

to make sure it always scales down

Raviraj Pophale · Answer 4 · 2023-08-25T06:59:23.433

0

Change the auto-scaling policy, Keep only CPU utilization metrics policy. In most application CUP metrics functions properly. If app is memory driven then only needs to use memory metrics for auto-scale policy.

Ref.: https://docs.openshift.com/container-platform/4.8/nodes/pods/nodes-pods-autoscaling.html

edited Aug 25 '23 at 06:59

answered Aug 25 '23 at 06:51

Raviraj Pophale

85
2
6

HPA scale down not happening properly

4 Answers4