"unable to retrieve the complete list of server APIs: tap.linkerd.io/v1alpha1" error using Linkerd on private cluster in GKE

Question

Why does the following error occur when I install Linkerd 2.x on a private cluster in GKE?

Error: could not get apiVersions from Kubernetes: unable to retrieve the complete list of server APIs: tap.linkerd.io/v1alpha1: the server is currently unable to handle the request

score 5 · Answer 1 · answered Jun 20 '20 at 09:30

Solution:

The steps I followed are:

kubectl get apiservices : If linkered apiservice is down with the error CrashLoopBackOff try to follow the step 2 otherwise just try to restart the linkered service using kubectl delete apiservice/"service_name". For me it was v1alpha1.tap.linkerd.io.
kubectl get pods -n kube-system and found out that pods like metrics-server, linkered, kubernetes-dashboard are down because of the main coreDNS pod was down.

For me it was:

NAME                          READY   STATUS             RESTARTS   AGE
pod/coredns-85577b65b-zj2x2   0/1     CrashLoopBackOff   7          13m

Use kubectl describe pod/"pod_name" to check the error in coreDNS pod and if it is down because of /etc/coredns/Corefile:10 - Error during parsing: Unknown directive proxy, then we need to use forward instead of proxy in the yaml file where coreDNS config is there. Because CoreDNS version 1.5x used by the image does not support the proxy keyword anymore.

cpretzer · Answer 2 · 2019-10-27T19:54:34.153

The default firewall rules of a private cluster on GKE only permit traffic on ports 443 and 10250. This allows communication to the kube-apiserver and kubelet, respectively.

Linkerd uses ports 8443 and 8089 for communication between the control and the proxies deployed to the data plane.

The tap component uses port 8089 to handle requests to its apiserver.

The proxy injector and service profile validator components, both of which are types of admission controllers, use port 8443 to handle requests.

The Linkerd 2 docs include instructions for configuring your firewall on a GKE private cluster: https://linkerd.io/2/reference/cluster-configuration/

They are included below:

Get the cluster name:

CLUSTER_NAME=your-cluster-name
gcloud config set compute/zone your-zone-or-region

Get the cluster MASTER_IPV4_CIDR:

MASTER_IPV4_CIDR=$(gcloud container clusters describe $CLUSTER_NAME \
  | grep "masterIpv4CidrBlock: " \
  | awk '{print $2}')

Get the cluster NETWORK:

NETWORK=$(gcloud container clusters describe $CLUSTER_NAME \
  | grep "^network: " \
  | awk '{print $2}')

Get the cluster auto-generated NETWORK_TARGET_TAG:

NETWORK_TARGET_TAG=$(gcloud compute firewall-rules list \
  --filter network=$NETWORK --format json \
  | jq ".[] | select(.name | contains(\"$CLUSTER_NAME\"))" \
  | jq -r '.targetTags[0]' | head -1)

Verify the values:

echo $MASTER_IPV4_CIDR $NETWORK $NETWORK_TARGET_TAG

# example output
10.0.0.0/28 foo-network gke-foo-cluster-c1ecba83-node

Create the firewall rules for proxy-injector and tap:

gcloud compute firewall-rules create gke-to-linkerd-control-plane \
  --network "$NETWORK" \
  --allow "tcp:8443,tcp:8089" \
  --source-ranges "$MASTER_IPV4_CIDR" \
  --target-tags "$NETWORK_TARGET_TAG" \
  --priority 1000 \
  --description "Allow traffic on ports 8843, 8089 for linkerd control-plane components"

Finally, verify that the firewall is created:

gcloud compute firewall-rules describe gke-to-linkerd-control-plane

score 3 · Answer 3 · answered Jan 08 '20 at 10:43

3

In my case, it was related to linkerd/linkerd2#3497, when the Linkerd service had some internal problems and couldn't respond back to the API service requests. Fixed by restarting its pods.

answered Jan 08 '20 at 10:43

kivagant

1,849
2
24
33

score 2 · Answer 4 · answered Mar 29 '22 at 04:47

This was a linkerd issue for me. To diagnose any linkerd related issues, you can use the linkerd CLI and run linkerd check this should show you if there is an issue with linkerd and links on instructions to fix it.

For me, the issue was that linkerd root certs had expired. In my case, linkerd was experimental in a dev cluster so I removed it. However, if you need to update your certificates you can follow the instructions at the following link.

https://linkerd.io/2.11/tasks/replacing_expired_certificates/

Thanks to https://stackoverflow.com/a/59644120/1212371 I was put on the right path.

score 0 · Answer 5 · answered Mar 07 '23 at 17:12

In my case, it was caused by a NetworkPolicy blocking apiserver access to the linkerd tap service. Keep in mind that if you have a NetworkPolicy, if the tap pods are in the podSelector and there is an Ingress policy that specifies a from section, then the apiserver will need to explicitly be given access.

It can be tricky to give only the apiserver access, so you might need to just open that port for the tap service with a separate netpol (any other suggestions welcome though).

"unable to retrieve the complete list of server APIs: tap.linkerd.io/v1alpha1" error using Linkerd on private cluster in GKE

5 Answers5