29

Background:

  • I have a VPC with 3 public subnets(the subnets have access to an internet gateway)

  • I have an EKS Cluster in this VPC, the EKS cluster is created from the console and not using eksctl

  • I used this tutorial from the official aws documentation, I managed to set my ALB controller and the controller is running perfectly:

The cluster contains two node groups:

  • First node group has one node of type: t3a.micro
  • Second node group has one node of type: t3.small
$ kubectl get deployment -n kube-system aws-load-balancer-controller
NAME                           READY   UP-TO-DATE   AVAILABLE   AGE
aws-load-balancer-controller   1/1     1            1           60m

I used their game example and here is the manifest file:

---
apiVersion: v1
kind: Namespace
metadata:
  name: game-2048
---
apiVersion: apps/v1
kind: Deployment
metadata:
  namespace: game-2048
  name: deployment-2048
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: app-2048
  replicas: 1
  template:
    metadata:
      labels:
        app.kubernetes.io/name: app-2048
    spec:
      containers:
      - image: alexwhen/docker-2048
        imagePullPolicy: Always
        name: app-2048
        ports:
        - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  namespace: game-2048
  name: service-2048
spec:
  ports:
    - port: 80
      targetPort: 80
      protocol: TCP
  type: NodePort
  selector:
    app.kubernetes.io/name: app-2048
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  namespace: game-2048
  name: ingress-2048
  annotations:
    kubernetes.io/ingress.class: alb
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip
spec:
  rules:
    - http:
        paths:
          - path: /*
            backend:
              serviceName: service-2048
              servicePort: 80

However when I describe ingress: I get the following messages

DNDT@DNDT-DEV-2 MINGW64 ~/Desktop/.k8s
$ kubectl describe ingress/ingress-2048 -n game-2048
Name:             ingress-2048
Namespace:        game-2048
Address:
Default backend:  default-http-backend:80 (<error: endpoints "default-http-backend" not found>)
Rules:
  Host        Path  Backends
  ----        ----  --------
  *
              /*   service-2048:80 (172.31.4.64:80)
Annotations:  alb.ingress.kubernetes.io/scheme: internet-facing
              alb.ingress.kubernetes.io/target-type: ip
              kubernetes.io/ingress.class: alb
Events:
  Type     Reason            Age                From     Message
  ----     ------            ----               ----     -------
  Warning  FailedBuildModel  9s (x13 over 32s)  ingress  Failed build model due to couldn't auto-discover subnets: unable to discover at least one subnet

Here are the tags set on the 3 subnets: enter image description here

And here are the route table for the subnets, as you can see they have an internet gw attached: enter image description here

I searched everywhere and they all talk about adding the tags, I created a completely new cluster from scratch but still getting this issue, are there any other things I'm missing?

I checked this answer, but its not relevant because its for ELB not ALB,

================================

Update:

I explicitly added the subnets:

alb.ingress.kubernetes.io/subnets: subnet-xxxxxx, subnet-xxxxx, subnet-xxx

And now I got my external IP, but with some warning

$  kubectl describe ingress/ingress-2048 -n game-2048
Name:             ingress-2048
Namespace:        game-2048
Address:          k8s-game2048-ingress2-330cc1efad-115981283.eu-central-1.elb.amazonaws.com
Default backend:  default-http-backend:80 (<error: endpoints "default-http-backend" not found>)
Rules:
  Host        Path  Backends
  ----        ----  --------
  *
              /*   service-2048:80 (172.31.13.183:80)
Annotations:  alb.ingress.kubernetes.io/scheme: internet-facing
              alb.ingress.kubernetes.io/subnets: subnet-8ea768e4, subnet-bf2821f2, subnet-7c023801
              alb.ingress.kubernetes.io/target-type: ip
              kubernetes.io/ingress.class: alb
Events:
  Type     Reason             Age   From     Message
  ----     ------             ----  ----     -------
  Warning  FailedDeployModel  43s   ingress  Failed deploy model due to ListenerNotFound: One or more listeners not found
           status code: 400, request id: e866eba4-328c-4282-a399-4e68f55ee266
  Normal   SuccessfullyReconciled  43s  ingress  Successfully reconciled

Also going to the browser and using the external ip return: 503 Service Temporarily Unavailable

Sabir Moglad
  • 739
  • 2
  • 7
  • 23
  • Shouldn't `alb.ingress.kubernetes.io/target-type: ip` be `alb.ingress.kubernetes.io/target-type: instance`? – Marcin Feb 04 '21 at 05:02
  • If I understand correctly, both should work, ip means the pod ip is exposed and the ALB will directly talk to the pod I tried instance, got the same error. – Sabir Moglad Feb 04 '21 at 05:11
  • @SabirMoglad I'm facing the same issue, can you please tell me which subnet should you use Public or Private? – Kathak Dabhi Feb 04 '21 at 09:14
  • @KathakDabhi what do you mean? I just added all of the subnets in my cluster (public and private) to the yaml file, but that doesn't work still. Something is wrong. – Sabir Moglad Feb 04 '21 at 09:18
  • there is an annotation `alb.ingress.kubernetes.io/subnets` tospecify which subnets to use; in that case, as @KathakDabhi asked, I wonder which (public or private subnet) to use – Ben Aug 28 '22 at 10:18

6 Answers6

44

In my case, it was because the I hadn't labeled the AWS subnets with the correct resource tags. https://kubernetes-sigs.github.io/aws-load-balancer-controller/guide/controller/subnet_discovery/

Edit - 5/28/2021

Public Subnets should be resource tagged with: kubernetes.io/role/elb: 1

Private Subnets should be tagged with: kubernetes.io/role/internal-elb: 1

Both private and public subnets should be tagged with: kubernetes.io/cluster/${your-cluster-name}: owned

or if the subnets are also used by non-EKS resources kubernetes.io/cluster/${your-cluster-name}: shared

Source: https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.1/deploy/subnet_discovery/

Andrew
  • 587
  • 5
  • 6
  • 1
    That link no longer exists (404) – Blunderchips May 28 '21 at 09:45
  • 1
    @Blunderchips fixed. – Andrew May 28 '21 at 18:03
  • Thanks a lot, In my case, everything was correct except the clustername in the label `kubernetes.io/cluster/${your-cluster-name}: owned` and that fixed it :) – Jerald Sabu M Jan 14 '22 at 09:29
  • A small answer but a huge help; else I would have to spend mts/hours figuring this out. Thanks a lot (for others who are coming to try out ALB use this sample ingress with the annotation -https://raw.githubusercontent.com/kubernetes-sigs/aws-load-balancer-controller/v2.3.1/docs/examples/2048/2048_full_latest.yaml – Alex Punnen Sep 07 '22 at 07:29
42

Ensure that --cluster-name in the aws-load-balancer-controller deployment is correct configured.

Use

kubectl get deployment -n kube-system aws-load-balancer-controller -oyaml |grep "cluster-name"

to get the cluster name in the deployment.

If it isn't correct, edit deployment with next command and rename it:

kubectl edit deployment -n kube-system aws-load-balancer-controller

TlmaK0
  • 3,578
  • 2
  • 31
  • 51
  • Geez that was the issue! How come this was not set? – Sabir Moglad Feb 04 '21 at 13:31
  • My bad, I know what step I skipped: ii. Edit the saved yaml file. Delete the ServiceAccount section from the yaml specification. Doing so prevents the annotation with the IAM role from being overwritten when the controller is deployed and preserves the service account that you created in step 4 if you delete the controller. In the Deployment spec section set the --cluster-name value to your Amazon EKS cluster name. – Sabir Moglad Feb 04 '21 at 13:34
  • 5
    We have all made the same mistake :) – TlmaK0 Feb 05 '21 at 09:32
  • 3
    this answer saved my day – sunsets Feb 15 '21 at 11:10
  • This is correct, and check whether the correct cluster added to all subnets as a tag – JeewanaSL Aug 11 '22 at 13:56
5

If upgrading from v2.1 to v2.2 of the aws-load-balancer-controller, be aware you will get this same error as there are new IAM Permissions that are required. See the CHANGELOG here in the release for details / links to those new permissions: https://github.com/kubernetes-sigs/aws-load-balancer-controller/releases/tag/v2.2.0

The explicit link to the IAM Permissions: https://raw.githubusercontent.com/kubernetes-sigs/aws-load-balancer-controller/v2.2.0/docs/install/iam_policy.json

Gowiem
  • 1,297
  • 17
  • 19
3

you can also explicitly define your specific subnets:

alb.ingress.kubernetes.io/subnets: subnet-xxx,subnet-yyyy

although it's still recommended to enable the auto discovery

a.k
  • 1,035
  • 10
  • 27
3

I ran into the same issue. My setup is EKS with Fargate, fully provisioned via Terraform. I had all the correct annotations, but was still getting the error Failed build model due to couldn't auto-discover subnets: unable to discover at least one subnet. Eventually, after trying to manually set the subnets as an annotation, I got a different error, saying Failed build model due to InvalidParameterValue: vpc-id. I then realised I set the wrong value for the helm parameter vpcId (I mistakenly had used the arn, instead of the id and when using Fargate, you have to explicitly set it, as there is no EC2MetaData available on the host). Changing this to ID solved the problem.

VHristov
  • 1,059
  • 2
  • 13
  • 25
  • Your answer clued me in to my issue: My script was grabbing the WRONG vpc-id; a hard-coded environment buried in a query was pulling the dev VPC rather than prod. Thanks for the hint! – SomeCallMeTim Feb 09 '23 at 22:07
  • This made me realize that the `vpcId` being set when installing with Helm had quotes surrounding the actual ID, which the controller tried to interpret as part of the VPC ID. It was because I got the VPC ID with: ```aws eks describe-cluster --name $cluster_name --region $region_code --query cluster.resourcesVpcConfig.vpcId```. I needed to use the `--output text` option to make it return the ID without quotes. – Nick Aberle Jun 27 '23 at 20:17
0

I had same issue with the cluster I created manually on AWS console.

But then I tried creating cluster using eksctl, which created subnets with slightly different tags ie:

Key Value
Name eksctl-cluster-name-cluster/SubnetPublicUSEAST1A
aws:cloudformation:logical-id SubnetPublicUSEAST1A
kubernetes.io/role/elb 1
aws:cloudformation:stack-name eksctl-cluster-name-cluster
alpha.eksctl.io/cluster-name cluster-name
aws:cloudformation:stack-id stack-id
alpha.eksctl.io/eksctl-version 0.76.0
eksctl.cluster.k8s.io/v1alpha1/cluster-name cluster-name

Subnet discovery could be related to some of these, or it could be to some subnet\IAM etc. configuration.
I suggest trying initiating cluster using eksctl

RanmaGo
  • 1
  • 3