I am using a horizontal pod autoscaler (hpa) in AKS (I will show this file below). My containers run a Flask API server that handles a post request. I used this line to run flask to make it threaded:
if __name__ == "__main__":
app.run(host='0.0.0.0', port=5003, threaded=True)
I do 20 calls on my Flask running locally and it is able to handle it, albeit very slowly. I do 20 calls on my AKS, the first time (so there is only 1 pod running)it gives me error responses. The second time, I get 20 responses without any errors (the number of pods has increased)
Now I am trying to figure out why it does not wait for an old pod to become available or for a new pod to be created. I thought that there was part of AKS that would do that.
Please let me know if I am missing something!
Deployment.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: *hidden*
spec:
selector:
matchLabels:
app: *hidden*
template:
metadata:
labels:
app: *hidden*
spec:
containers:
- name: *hidden*
image: *hidden*
env:
- name: *hidden*
valueFrom:
secretKeyRef:
name: *hidden*
key: *hidden*
imagePullPolicy: Always
resources:
requests:
cpu: "300m"
memory: "400Mi"
limits:
cpu: "300m"
memory: "400Mi"
ports:
- containerPort: 5003
imagePullSecrets:
- name: *hidden*
---
apiVersion: v1
kind: Service
metadata:
name: *hidden*
spec:
selector:
app: *hidden*
ports:
- port: 5003
protocol: TCP
targetPort: 5003
type: LoadBalancer
hpa.yaml:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: *hidden*
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: *hidden*
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 20
behavior:
scaleUp:
policies:
- type: Pods
value: 20
periodSeconds: 60
scaleDown:
policies:
- type: Pods
value: 4
periodSeconds: 60```