20

After deleting kubernetes cluster with "terraform destroy" I can't create it again anymore.

"terraform apply" returns the following error message:

Error: Kubernetes cluster unreachable: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable

Here is the terraform configuration:

terraform {
  backend "s3" {
    bucket = "skyglass-msur"
    key    = "terraform/backend"
    region = "us-east-1"
  }
}

locals {
  env_name         = "staging"
  aws_region       = "us-east-1"
  k8s_cluster_name = "ms-cluster"
}

variable "mysql_password" {
  type        = string
  description = "Expected to be retrieved from environment variable TF_VAR_mysql_password"
}

provider "aws" {
  region = local.aws_region
}

data "aws_eks_cluster" "msur" {
  name = module.aws-kubernetes-cluster.eks_cluster_id
}

module "aws-network" {
  source = "github.com/skyglass-microservices/module-aws-network"

  env_name              = local.env_name
  vpc_name              = "msur-VPC"
  cluster_name          = local.k8s_cluster_name
  aws_region            = local.aws_region
  main_vpc_cidr         = "10.10.0.0/16"
  public_subnet_a_cidr  = "10.10.0.0/18"
  public_subnet_b_cidr  = "10.10.64.0/18"
  private_subnet_a_cidr = "10.10.128.0/18"
  private_subnet_b_cidr = "10.10.192.0/18"
}

module "aws-kubernetes-cluster" {
  source = "github.com/skyglass-microservices/module-aws-kubernetes"

  ms_namespace       = "microservices"
  env_name           = local.env_name
  aws_region         = local.aws_region
  cluster_name       = local.k8s_cluster_name
  vpc_id             = module.aws-network.vpc_id
  cluster_subnet_ids = module.aws-network.subnet_ids

  nodegroup_subnet_ids     = module.aws-network.private_subnet_ids
  nodegroup_disk_size      = "20"
  nodegroup_instance_types = ["t3.medium"]
  nodegroup_desired_size   = 1
  nodegroup_min_size       = 1
  nodegroup_max_size       = 5
}

# Create namespace
# Use kubernetes provider to work with the kubernetes cluster API
provider "kubernetes" {
  # load_config_file       = false
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.msur.certificate_authority.0.data)
  host                   = data.aws_eks_cluster.msur.endpoint
  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    command     = "aws-iam-authenticator"
    args        = ["token", "-i", "${data.aws_eks_cluster.msur.name}"]
  }
}

# Create a namespace for microservice pods
resource "kubernetes_namespace" "ms-namespace" {
  metadata {
    name = "microservices"
  }
}

P.S. There seems to be the issue with terraform kubernetes provider for 0.14.7

I couldn't use "load_config_file" = false in this version, so I had to comment it, which seems to be the reason of this issue.

P.P.S. It could also be the issue with outdated cluster_ca_certificate, which terraform tries to use: deleting this certificate could be enough, although I'm not sure, where it is stored.

Mykhailo Skliar
  • 1,242
  • 1
  • 8
  • 19

8 Answers8

40

Before doing something radical like manipulating the state directly, try setting the KUBE_CONFIG_PATH variable:

export KUBE_CONFIG_PATH=/path/to/.kube/config

After this rerun the plan or apply command. This has fixed the issue for me.

Urosh T.
  • 3,336
  • 5
  • 34
  • 42
  • 2
    for some reason my env had `KUBECONFIG` set instead which seemed to work - until recently. `KUBE_CONFIG_PATH` werks for me. cool! – jitter Nov 11 '21 at 11:29
9

I had the same issue. I even manually deleted the EKS cluster which really messed up the terraform state.

However, after wasting a few hours, I found out that there is a very simple solution.

You can run

terraform state rm <resource_type>.<resource_name>

I just executed

terraform state rm `terraform state list | grep eks`

to remove all the entries for a particular service from state file in a safe manner.

Dharman
  • 30,962
  • 25
  • 85
  • 135
harsha
  • 103
  • 2
  • 8
  • Thanks I solved my problem by terraform state rm `terraform state list | grep eks` terraform state rm `terraform state list | grep kubectl` terraform state rm `terraform state list | grep helm` – Shivam Anand Apr 17 '23 at 05:17
2

This happened to me when I was needed to make an update to the cluster that requires to delete some resources. You can also try to run terraform apply -refresh=false and just let it destroy it.

1

Deleting terraform state S3 bucket on AWS solved the issue.

Mykhailo Skliar
  • 1,242
  • 1
  • 8
  • 19
1

In my case this error was occurring when I was trying to destroy resources with 'tf destroy'

The logical solution for me was to do the following action:

  1. Run 'tf apply -refresh=true' on terraform state, where you bootstrap K8S cluster. This is the workspace, where you output K8S credentials (k8s_cluster_access_token)
  2. Run 'tf apply -refresh=true' on terraform state, that is using above K8S credentials to create K8S resources.
  3. Run 'tf destroy' (finished with success)
Nepomucen
  • 4,449
  • 3
  • 9
  • 24
1

I solved this by using the official helm provider instead of the kubernetes one.

First, we list the required providers:

terraform {
  backend "s3" {
    bucket  = "..."
    key     = "..."
    region  = "..."
    profile = "..."
  }

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 4.49"
    }

    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = "~> 2.16.1"
    }

    helm = {
      source  = "hashicorp/helm"
      version = "~> 2.8.0"
    }
  }
}

Then, we configure the provider:

data "aws_eks_cluster" "cluster" {
  name = var.cluster_name
}

provider "helm" {
  kubernetes {
    host                   = data.aws_eks_cluster.cluster.endpoint
    cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
    exec {
      api_version = "client.authentication.k8s.io/v1"
      command = "aws"
      args = [
        "eks",
        "get-token",
        "--cluster-name",
        data.aws_eks_cluster.cluster.name,
        "--profile",
        var.profile
      ]
    }
  }
}

Finally, we add charts via the helm_release resources:

resource "helm_release" "foo" {
  name             = "foo"
  chart            = "foo"
  repository       = "https://foo.bar/chart"
  namespace        = "foo"
  create_namespace = true
  values           = [templatefile("${path.module}/chart/values.yaml", {})]
}
gRizzlyGR
  • 306
  • 2
  • 9
0

Deleting .terraform sub-folder in the folder where you run "terraform" command should also solve the issue.

I didn't try it for this exact situation, but I had a similar issue today, so I decided to share another solution. It seems less radical, than deleting S3 bucket.

Mykhailo Skliar
  • 1,242
  • 1
  • 8
  • 19
0

The cause of this problem was that usually, I had something in my kubeconfig, but it was empty for some reason. Somewhat of a random fix, but when I reinitialized the config (I'm using Minikube as well, so I started minikube) Terraform was then happy.

I'm curious if using the aws command line to update the kubeconfig would work in your case. https://docs.aws.amazon.com/cli/latest/reference/eks/update-kubeconfig.html

Gautam Savaliya
  • 1,403
  • 2
  • 20
  • 31
matteo
  • 1