16

I'm following the kOps tutorial to set up a cluster on AWS. I am able to create a cluster with

kops create cluster
kops update cluster --yes

However, when validating whether my cluster is set up correctly with

kops validate cluster

I get stuck with error:

unexpected error during validation: error listing nodes: Unauthorized

The same error happens in many other kOps operations.

I checked my kOps/K8s version and it is 1.19:

> kops version
Version 1.19.1 (git-8589b4d157a9cb05c54e320c77b0724c4dd094b2)

> kubectl version
Client Version: version.Info{Major:"1", Minor:"20" ...
Server Version: version.Info{Major:"1", Minor:"19" ...

How can I fix this?

roim
  • 4,780
  • 2
  • 27
  • 35

2 Answers2

38

As of kOps 1.19 there are two reasons you will suddenly get this error:

  1. If you delete a cluster and reprovision it, your old admin is not removed from the kubeconfig and kOps/kubectl tries to reuse it.
  2. New certificates have a TTL of 18h by default, so you need to reprovision them about once a day.

Both issues above are fixed by running kops export kubecfg --admin.

Note that using the default TLS credentials is discouraged. Consider things like using an OIDC provider instead.

Ole Markus With
  • 1,017
  • 1
  • 8
  • 10
  • Oh interesting, I might have been hit by (1), and the fix I mentioned in another answer also required me to export kubecfg again, so that might have incidentally fixed it. Thanks! – roim Feb 25 '21 at 02:50
  • Even if I use kops export kubecfg --admin, the Kubeconfig file that I get still has a TTL of 18 hours (although the cluster validation runs fine) @Ole is that what you observed too? – Sagar Kalburgi Mar 09 '21 at 00:06
  • Yes. For security reasons, that is the intended behaviour. – Ole Markus With Mar 09 '21 at 08:29
  • For a solution to get back the old behavior, taken from: https://github.com/kubernetes/kops/blob/master/docs/releases/1.19-NOTES.md, you need to provide the ttl when exporting the credentials, ex: `kops export kubecfg --admin=87600h` – mitsos1os Apr 16 '21 at 09:42
  • Strongly discourage creating unrevocable long-lived certificates like that. At the moment, rolling cluster CA involves extended API outage. – Ole Markus With Apr 16 '21 at 13:54
3

Kubernetes v1.19 removed basic auth support, incidentally making the default kOps credentials unable to authorize. To work around this, we will update our cluster to use a Network Load Balancer (NLB) instead of the default Classic Load Balancer (CLB). The NLB can be accessed with non-deprecated AuthZ mechanisms.

After creating your cluster, but before updating cloud resources (before running with --yes), edit its configuration to use a NLB:

kops edit cluster

Then update your load balancer class to Network:

spec:
  api:
    loadBalancer:
      class: Network

Now update cloud resources with

kops update cluster --yes

And you'll be able to pass AuthZ with kOps on your cluster.

Note that there are several other advantages to using an NLB as well, check the AWS docs for a comparison.

If you have a pre-existing cluster you want to update to a NLB, there are more steps to follow to ensure clients don't start failing AuthZ, to delete old resources, etc. You'll find a better guide for that in the kOps v1.19 release notes.

roim
  • 4,780
  • 2
  • 27
  • 35