4

I'm using a HPA based on a custom metric on GKE.

The HPA is not working and it's showing me this error log:

unable to fetch metrics from custom metrics API: the server is currently unable to handle the request

When I run kubectl get apiservices | grep custom I get

v1beta1.custom.metrics.k8s.io services/prometheus-adapter False (FailedDiscoveryCheck) 135d

this is the HPA spec config :

spec:
  scaleTargetRef:
    kind: Deployment
    name: api-name
    apiVersion: apps/v1
  minReplicas: 3
  maxReplicas: 50
  metrics:
    - type: Object
      object:
        target:
          kind: Service
          name: api-name
          apiVersion: v1
        metricName: messages_ready_per_consumer
        targetValue: '1'

and this is the service's spec config :

spec:
  ports:
    - name: worker-metrics
      protocol: TCP
      port: 8080
      targetPort: worker-metrics
  selector:
    app.kubernetes.io/instance: api
    app.kubernetes.io/name: api-name
  clusterIP: 10.8.7.9
  clusterIPs:
    - 10.8.7.9
  type: ClusterIP
  sessionAffinity: None
  ipFamilies:
    - IPv4
  ipFamilyPolicy: SingleStack

What should I do to make it work ?

mohamed wael thabet
  • 195
  • 2
  • 4
  • 12

3 Answers3

1

First of all, confirm that the Metrics Server POD is running in your kube-system namespace. Also, you can use the following manifest:

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: metrics-server
  namespace: kube-system
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: metrics-server
  namespace: kube-system
  labels:
    k8s-app: metrics-server
spec:
  selector:
    matchLabels:
      k8s-app: metrics-server
  template:
    metadata:
      name: metrics-server
      labels:
        k8s-app: metrics-server
    spec:
      serviceAccountName: metrics-server
      volumes:
      # mount in tmp so we can safely use from-scratch images and/or read-only containers
      - name: tmp-dir
        emptyDir: {}
      containers:
      - name: metrics-server
        image: k8s.gcr.io/metrics-server-amd64:v0.3.1
        command:
        - /metrics-server
        - --kubelet-insecure-tls
        - --kubelet-preferred-address-types=InternalIP
        imagePullPolicy: Always
        volumeMounts:
        - name: tmp-dir
          mountPath: /tmp

If so, take a look into the logs and look for any stackdriver adapter’s line. This issue is commonly caused due to a problem with the custom-metrics-stackdriver-adapter. It usually crashes in the metrics-server namespace. To solve that, use the resource from this URL, and for the deployment, use this image:

gcr.io/google-containers/custom-metrics-stackdriver-adapter:v0.10.1

Another common root cause of this is an OOM issue. In this case, adding more memory solves the problem. To assign more memory, you can specify the new memory amount in the configuration file, as the following example shows:

apiVersion: v1
kind: Pod
metadata:
  name: memory-demo
  namespace: mem-example
spec:
  containers:
  - name: memory-demo-ctr
    image: polinux/stress
    resources:
      limits:
        memory: "200Mi"
      requests:
        memory: "100Mi"
    command: ["stress"]
    args: ["--vm", "1", "--vm-bytes", "150M", "--vm-hang", "1"]

In the above example, the Container has a memory request of 100 MiB and a memory limit of 200 MiB. In the manifest, the "--vm-bytes", "150M" argument tells the Container to attempt to allocate 150 MiB of memory. You can visit this Kubernetes Official Documentation to have more references about the Memory settings.

You can use the following threads for more reference GKE - HPA using custom metrics - unable to fetch metrics, Stackdriver-metadata-agent-cluster-level gets OOMKilled, and Custom-metrics-stackdriver-adapter pod keeps crashing.

  • 1
    i have a metrics server pod running : kubectl get pods -n kube-system | grep metrics-server metrics-server-v0.4.4-c9bf648dc-dk476 2/2 Running 0 17h – mohamed wael thabet Mar 11 '22 at 08:51
  • Great @mohamedwaelthabet. Then please follow the other 2 steps, the one about the logs and the one about the OOM issue and let us know the result. I just edited my answer adding how to increase the memory amount. – Nestor Daniel Ortega Perez Mar 11 '22 at 15:14
  • 1
    what's the "pod" we are talking about in the ocnfiguration file please ? – mohamed wael thabet Mar 14 '22 at 10:22
  • You posted 2 manifests, one for HPA and the other one for the Service. There are no PODs listed. I just asked you to confirm that the Metrics Server POD is running and you did. If you want to know which POD is your Service pointing to, run app.kubernetes.io/instance=api,app.kubernetes.io/name=api-name. Plus, please follow the instructions I gave you regarding the stackdriver adapter and the other for the OOM issue and share the results with us. – Nestor Daniel Ortega Perez Mar 14 '22 at 23:10
  • @mohamedwaelthabet Was the information posted in the answer helpful for you? Or, do you consider that you need more information in order to resolve your issue or doubt? – Nestor Daniel Ortega Perez Mar 17 '22 at 18:55
1

Adding this block in my EKS nodes security group rules solved the issue for me:

node_security_group_additional_rules = {
  ...
  ingress_cluster_metricserver = {
    description                   = "Cluster to node 4443 (Metrics Server)"
    protocol                      = "tcp"
    from_port                     = 4443
    to_port                       = 4443
    type                          = "ingress"
    source_cluster_security_group = true 
  }
  ...
}
0

What do you get for kubectl get pod -l "app.kubernetes.io/instance=api,app.kubernetes.io/name=api-name"? There should be a pod, to which the service reffers. If there is a pod, check its logs with kubectl logs <pod-name>. you can add -f to kubectl logs command, to follow the logs.