Kubernetes metrics-server FailedDiscoveryCheck

Question

was hoping to get a little help, my Google-Fu didnt get me much closer. I'm trying to install the metrics server for my fedora-coreos kubernetes 4 node cluster like so:

kubectl apply -f deploy/kubernetes/
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created
serviceaccount/metrics-server created
deployment.apps/metrics-server created
service/metrics-server created
clusterrole.rbac.authorization.k8s.io/system:metrics-server created
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created

the service seems to never start

kubectl describe apiservice v1beta1.metrics.k8s.io
Name:         v1beta1.metrics.k8s.io
Namespace:
Labels:       <none>
Annotations:  kubectl.kubernetes.io/last-applied-configuration:
                {"apiVersion":"apiregistration.k8s.io/v1beta1","kind":"APIService","metadata":{"annotations":{},"name":"v1beta1.metrics.k8s.io"},"spec":{"...
API Version:  apiregistration.k8s.io/v1
Kind:         APIService
Metadata:
  Creation Timestamp:  2020-03-04T16:53:33Z
  Resource Version:    1611816
  Self Link:           /apis/apiregistration.k8s.io/v1/apiservices/v1beta1.metrics.k8s.io
  UID:                 65d9a56a-c548-4d7e-a647-8ce7a865a266
Spec:
  Group:                     metrics.k8s.io
  Group Priority Minimum:    100
  Insecure Skip TLS Verify:  true
  Service:
    Name:            metrics-server
    Namespace:       kube-system
    Port:            443
  Version:           v1beta1
  Version Priority:  100
Status:
  Conditions:
    Last Transition Time:  2020-03-04T16:53:33Z
    Message:               failing or missing response from https://10.3.230.59:443/apis/metrics.k8s.io/v1beta1: bad status from https://10.3.230.59:443/apis/metrics.k8s.io/v1beta1: 403
    Reason:                FailedDiscoveryCheck
    Status:                False
    Type:                  Available
Events:                    <none>

Diagnosing I have found googling around:

kubectl get deploy,svc -n kube-system |egrep metrics-server
deployment.apps/metrics-server   1/1     1            1           8m7s
service/metrics-server   ClusterIP   10.3.230.59   <none>        443/TCP         8m7s

kubectl get --raw "/apis/metrics.k8s.io/v1beta1/nodes"
Error from server (ServiceUnavailable): the server is currently unable to handle the request

kubectl get all --all-namespaces | grep -i metrics-server
kube-system      pod/metrics-server-75b5d446cd-zj4jm                              1/1     Running   0          9m11s
kube-system   service/metrics-server   ClusterIP      10.3.230.59    <none>        443/TCP                                     9m11s
kube-system      deployment.apps/metrics-server   1/1     1            1           9m11s
kube-system      replicaset.apps/metrics-server-75b5d446cd   1         1         1       9m11s

kubectl logs -f metrics-server-75b5d446cd-zj4jm -n kube-system
I0304 16:53:36.475657       1 serving.go:312] Generated self-signed cert (/tmp/apiserver.crt, /tmp/apiserver.key)
W0304 16:53:38.229267       1 authentication.go:296] Cluster doesn't provide requestheader-client-ca-file in configmap/extension-apiserver-authentication in kube-system, so request-header client certificate authentication won't work.
I0304 16:53:38.267760       1 secure_serving.go:116] Serving securely on [::]:4443

kubectl get -n kube-system deployment metrics-server -o yaml | grep -i args -A 10
      {"apiVersion":"apps/v1","kind":"Deployment","metadata":{"annotations":{},"labels":{"k8s-app":"metrics-server"},"name":"metrics-server","namespace":"kube-system"},"spec":{"selector":{"matchLabels":{"k8s-app":"metrics-server"}},"template":{"metadata":{"labels":{"k8s-app":"metrics-server"},"name":"metrics-server"},"spec":{"containers":[{"args":["--cert-dir=/tmp","--secure-port=4443","--kubelet-insecure-tls","--kubelet-preferred-address-types=InternalIP"],"image":"k8s.gcr.io/metrics-server-amd64:v0.3.6","imagePullPolicy":"IfNotPresent","name":"metrics-server","ports":[{"containerPort":4443,"name":"main-port","protocol":"TCP"}],"securityContext":{"readOnlyRootFilesystem":true,"runAsNonRoot":true,"runAsUser":1000},"volumeMounts":[{"mountPath":"/tmp","name":"tmp-dir"}]}],"nodeSelector":{"beta.kubernetes.io/os":"linux","kubernetes.io/arch":"amd64"},"serviceAccountName":"metrics-server","volumes":[{"emptyDir":{},"name":"tmp-dir"}]}}}}
  creationTimestamp: "2020-03-04T16:53:33Z"
  generation: 1
  labels:
    k8s-app: metrics-server
  name: metrics-server
  namespace: kube-system
  resourceVersion: "1611810"
  selfLink: /apis/apps/v1/namespaces/kube-system/deployments/metrics-server
  uid: 006e758e-bd33-47d7-8378-d3a8081ee8a8
spec:
--
      - args:
        - --cert-dir=/tmp
        - --secure-port=4443
        - --kubelet-insecure-tls
        - --kubelet-preferred-address-types=InternalIP
        image: k8s.gcr.io/metrics-server-amd64:v0.3.6
        imagePullPolicy: IfNotPresent
        name: metrics-server
        ports:
        - containerPort: 4443
          name: main-port

finally my deployment config:

 spec:
  selector:
    matchLabels:
      k8s-app: metrics-server
  template:
    metadata:
      name: metrics-server
      labels:
        k8s-app: metrics-server
    spec:
      serviceAccountName: metrics-server
      volumes:
      # mount in tmp so we can safely use from-scratch images and/or read-only containers
      - name: tmp-dir
        emptyDir: {}
      containers:
      - name: metrics-server
        image: k8s.gcr.io/metrics-server-amd64:v0.3.6
        command:
          - /metrics-server
          - --kubelet-insecure-tls
          - --kubelet-preferred-address-types=InternalIP
        args:
          - --cert-dir=/tmp
          - --secure-port=4443
          - --kubelet-insecure-tls
          - --kubelet-preferred-address-types=InternalIP
        ports:
        - name: main-port
          containerPort: 4443
          protocol: TCP
        securityContext:
          readOnlyRootFilesystem: true
          runAsNonRoot: true
          runAsUser: 1000
        imagePullPolicy: IfNotPresent
        volumeMounts:
        - name: tmp-dir
          mountPath: /tmp
      hostNetwork: true
      nodeSelector:
        beta.kubernetes.io/os: linux
        kubernetes.io/arch: "amd64"

I'm at a loss of what it could be getting the metrics service to start and just get the basic kubectl top node to display any info all I get is

Error from server (ServiceUnavailable): the server is currently unable to handle the request (get pods.metrics.k8s.io)

I have searched the internet and tried adding the args: and command: lines but no luck

command:
           - /metrics-server
           - --kubelet-insecure-tls
           - --kubelet-preferred-address-types=InternalIP
args:
          - --cert-dir=/tmp
          - --secure-port=4443
          - --kubelet-insecure-tls
          - --kubelet-preferred-address-types=InternalIP

Can anyone shed light on how to fix this? Thanks

Pastebin log file Log File

I'm guessing that you are using `kubeadm`. Could you provide version of kubeadm, kubectl, docker etc? Also what CNI are you using? Ive already encounter similar issue here: https://stackoverflow.com/questions/60101398/kubernetes-without-pod-metrics/60318887#60318887 Could you add to `hostNetwork: true` to your deployment? Did you try to use `weave net` CNI? — PjoterS, Mar 04 '20 at 19:05
System Info: OS Image: Fedora CoreOS 31.20200210.3.0 Operating System: linux Architecture: amd64 Container Runtime Version: docker://18.9.8 Kubelet Version: v1.17.3 Kube-Proxy Version: v1.17.3 — Frank S, Mar 04 '20 at 19:27
What about CNI? Did you try to run metrics-server deployment with `hostNetwork: true`? — PjoterS, Mar 04 '20 at 19:29
I also uncommented hostNetwork: true from my deployment file and no change. — Frank S, Mar 04 '20 at 19:35
Did you also uncomment `- --kubelet-preferred-address-types=InternalIP` and `- --kubelet-insecure-tls`? Those 2 flags and `hostNetwork: true` cannot be commented — PjoterS, Mar 04 '20 at 19:46
Hi, yes, I will update the original deployment config., But still the same result — Frank S, Mar 04 '20 at 19:54

score 5 · Accepted Answer · answered Mar 12 '20 at 17:05

I've reproduced your issue. I have used Calico as CNI.

$ kubectl get nodes
NAME              STATUS   ROLES    AGE     VERSION
fedora-master     Ready    master   6m27s   v1.17.3
fedora-worker-1   Ready    <none>   4m48s   v1.17.3
fedora-worker-2   Ready    <none>   4m46s   v1.17.3

fedora-master:~/metrics-server$ kubectl describe apiservice v1beta1.metrics.k8s.io
Status:
  Conditions:
    Last Transition Time:  2020-03-12T16:04:59Z
    Message:               failing or missing response from https://10.99.122.196:443/apis/metrics.k8s.io/v
1beta1: Get https://10.99.122.196:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled while waiting
 for connection (Client.Timeout exceeded while awaiting headers)

fedora-master:~/metrics-server$ kubectl top pod
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get pods.metrics.k8s.io)

When you have only one node in cluster, default settings in metrics-server repo works correctly. Issue occurs when you have more than 2 nodes. Ive used 1 master and 2 workers to reproduce. Below example deployment which works correct (have all required args). Before, please remove your current metrics-server YAMLs (kubectl delete -f deploy/kubernetes) and execute:

$ git clone https://github.com/kubernetes-sigs/metrics-server
$ cd metrics-server/deploy/kubernetes/
$ vi metrics-server-deployment.yaml

Paste below YAML:

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: metrics-server
  namespace: kube-system
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: metrics-server
  namespace: kube-system
  labels:
    k8s-app: metrics-server
spec:
  selector:
    matchLabels:
      k8s-app: metrics-server
  template:
    metadata:
      name: metrics-server
      labels:
        k8s-app: metrics-server
    spec:
      serviceAccountName: metrics-server
      volumes:
      # mount in tmp so we can safely use from-scratch images and/or read-only containers
      - name: tmp-dir
        emptyDir: {}
      hostNetwork: true
      containers:
      - name: metrics-server
        image: k8s.gcr.io/metrics-server-amd64:v0.3.6
        imagePullPolicy: IfNotPresent
        args:
          - /metrics-server
          - --kubelet-preferred-address-types=InternalIP
          - --kubelet-insecure-tls
          - --cert-dir=/tmp
          - --secure-port=4443
        ports:
        - name: main-port
          containerPort: 4443
          protocol: TCP
        securityContext:
          readOnlyRootFilesystem: true
          runAsNonRoot: true
          runAsUser: 1000
        volumeMounts:
        - name: tmp-dir
          mountPath: /tmp
      nodeSelector:
        kubernetes.io/os: linux
        kubernetes.io/arch: "amd64"

save and quit using :wq

$ cd ~/metrics-server
$ kubectl apply -f deploy/kubernetes/
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created
serviceaccount/metrics-server created
deployment.apps/metrics-server created
service/metrics-server created
clusterrole.rbac.authorization.k8s.io/system:metrics-server created
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created

Wait a while for metrics-server to gather a few metrics from nodes.

$ kubectl describe apiservice v1beta1.metrics.k8s.io
Name:         v1beta1.metrics.k8s.io
Namespace:    
...
Metadata:
  Creation Timestamp:  2020-03-12T16:57:58Z
...
Spec:
  Group:                     metrics.k8s.io
  Group Priority Minimum:    100
  Insecure Skip TLS Verify:  true
  Service:
    Name:            metrics-server
    Namespace:       kube-system
    Port:            443
  Version:           v1beta1
  Version Priority:  100
Status:
  Conditions:
    Last Transition Time:  2020-03-12T16:58:01Z
    Message:               all checks passed
    Reason:                Passed
    Status:                True
    Type:                  Available
Events:                    <none>

after a few minutes you can use top.

$ kubectl top nodes
NAME              CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
fedora-master     188m         9%     1315Mi          17%       
fedora-worker-1   109m         5%     982Mi           13%       
fedora-worker-2   84m          4%     969Mi           13%

If you will still encounter some issues, please add - --v=6 to deployment and provide logs from metrics-server pod.

containers:
      - name: metrics-server
        image: k8s.gcr.io/metrics-server-amd64:v0.3.1
        args:
          - /metrics-server
          - --v=6
          - --kubelet-preferred-address-types=InternalIP
          - --kubelet-insecure-tls

Wow really appreciate the detailed explanation, but I'm still getting the error. Adding the pastebin link to the original post. Thanks — Frank S, Mar 12 '20 at 19:47
Did you removed completely earlier metrics-server resources? What CNI are you using? Did you try `Weave` as CNI? — PjoterS, Mar 12 '20 at 19:54
Yes, I had completely removed the earlier metric-server resource. I'm using Calico as the CNI. I'm thinking of blasting this k8s cluster and go with a k3s cluster since I'm also having issues getting rook ceph to work also. — Frank S, Mar 12 '20 at 20:28
I think I might have found the issue, did you have the aggregator enabled when you built your cluster? Since I use typhoon to build the cluster, I noticed that is disabled by default. — Frank S, Mar 13 '20 at 14:02
For any reading this and used posiden.io to build there Kubernetes cluster, you have to enable the aggregation layer if you look in variables.tf you'll see the arg. Thanks, @PjoterS really appreciate the detailed explanation and help! — Frank S, Mar 14 '20 at 01:03
Ive reproduced this on GCE, probably it was enabled as default. There is also flag which can be used in YAML `- --requestheader-allowed-names=aggregator`. Glad to hear you were able to solve it. — PjoterS, Mar 16 '20 at 12:21
Thanks, it worked for me after adding "hostNetwork: true" for the Flannel network. — Syed Abdul Qadeer, Jun 19 '20 at 06:09
I diffed your `kind: Deployment` against what I had deployed to see what you changed. [Got this](https://mpen.xyz/share/2020/07/phpstorm64_2020-07-05_19-17-18.png). Applied it, and it's working now. Thanks! — mpen, Jul 06 '20 at 02:19
@PjoterS I added - --requestheader-allowed-names=aggregator in yaml file, still no luck. I see failing or missing response from https://x.x.x.x:443/apis/metrics.k8s.io/v1beta1: Get https://x.x.x.x:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) — user312307, Feb 16 '21 at 23:36

score 0 · Answer 2 · answered Mar 04 '22 at 16:25

You need to carefully check logs for calico-node pods. In my case i have some other network interfaces and the autodetection mechanism in calico was detecting wrong interface (ip address). You need to consult this documentation https://projectcalico.docs.tigera.io/reference/node/configuration.

What i did in my case, was simply:

kubectl set env daemonset/calico-node -n kube-system IP_AUTODETECTION_METHOD=cidr=172.16.8.0/24

cidr is my "working network". After this, all calico-nodes restarted and suddenly everything was fine.

Kubernetes metrics-server FailedDiscoveryCheck

2 Answers2

Linked