0

when i launch the node normally, everything working fine, but when i try to launch it using a launch template, im having connection issues within the cluster.

more specifically, aws-node pod fails with the error:

{"level":"info","caller":"/usr/local/go/src/runtime/proc.go:225","msg":"timeout: failed to connect service \":50051\" within 5s"}

digging through the other posts here, many people seem to point to iam role issues, but my iam role is fine, and besides ive been using the same role to launch many other nodes and they launched succesfully.

here are my terraform files:

resource "aws_eks_node_group" "eth-staking-nodes" {
  cluster_name    = aws_eks_cluster.staking.name
  node_group_name = "ethstaking-nodes-testnet"
  node_role_arn   = aws_iam_role.nodes.arn

  subnet_ids = [    data.aws_subnet.private-1.id,
    data.aws_subnet.private-2.id
  ]

  scaling_config {
    desired_size = 1
    max_size     = 5
    min_size     = 0
  }

  update_config {
    max_unavailable = 1
  }

  labels = {
    role = "general"
  }

  launch_template {
    version = aws_launch_template.staking.latest_version
    id      = aws_launch_template.staking.id
  }

  depends_on = [
    aws_iam_role_policy_attachment.nodes-AmazonEKSWorkerNodePolicy,
    aws_iam_role_policy_attachment.nodes-AmazonEKS_CNI_Policy,
    aws_iam_role_policy_attachment.nodes-AmazonEC2ContainerRegistryReadOnly,
  ]
}

the launch template:

esource "aws_launch_template" "staking" {
  name          = "${var.stage}-staking-node-launch-template"
  instance_type = "m5.2xlarge"
  image_id      = "ami-08712c7468e314435"

  key_name = "nivpem"
  
  block_device_mappings {
    device_name = "/dev/xvda"

    ebs {
      volume_size = 450
      volume_type = "gp2"
    }
  }

  lifecycle {
    create_before_destroy = false
  }

  vpc_security_group_ids = [aws_security_group.eks-ec2-sg.id]
  user_data = base64encode(templatefile("${path.module}/staking_userdata.sh", {
        password = "********"
      }))

  tags = {
    "eks:cluster-name"   = aws_eks_cluster.staking.name
    "eks:nodegroup-name" = "ethstaking-nodes-testnet"
  }

  tag_specifications {
    resource_type = "instance"

    tags = {
      Name                 = "${var.stage}-staking-node"
      "eks:cluster-name"   = aws_eks_cluster.staking.name
      "eks:nodegroup-name" = "ethstaking-nodes-testnet"
    }
  }
}

security group:

resource "aws_security_group" "eks-ec2-sg" {
  name        = "eks-ec2-sg-staking-testnet"
  vpc_id      = data.aws_vpc.vpc.id

  ingress {
    from_port        = 0
    to_port          = 0
    protocol         = "-1"
    cidr_blocks      = ["0.0.0.0/0"]
  }

  egress {
    from_port        = 0
    to_port          = 0
    protocol         = "-1"
    cidr_blocks      = ["0.0.0.0/0"]
    ipv6_cidr_blocks = ["::/0"]
  }

  tags = {
    Name = "allow_tls"
  }
}
  • Have you configured the networking as per the requirements? For example, are your nodes in a private subnet? If so, is a NAT Gateway deployed? Is there a corresponding Internet Gateway attached to the VPC? – Marko E Apr 10 '23 at 13:15
  • yes they are. if i launch the node with the same configuration but without the launch template it works perfectly. – Niv Shitrit Apr 10 '23 at 23:21
  • Does the answer you've gotten work for you then? – Marko E Apr 11 '23 at 06:58
  • No, still can't get the node to work when launching through a launch template. This issue screams security group issue, as im seeing 'connection refused' among the errors from aws-node pod. I reviewed the node that got created, and everything from security groups to iam role seems to be correct and should suffice – Niv Shitrit Apr 12 '23 at 11:42
  • Would you mind adding the SG configuration to the question? – Marko E Apr 12 '23 at 12:43
  • its set to 0.0.0.0/0 for all ports (for testing's sake) i will add it to the question – Niv Shitrit Apr 13 '23 at 07:38
  • Did you try adding vpc_config with `vpc_config` and `endpoint_public_access` set to true in your `aws_eks_cluster` resource? – Mohammad Teimori Pabandi Apr 13 '23 at 14:55
  • 1
    @MohammadTeimoriPabandi that was it! although in my case i was using private subnets, so i had to set the value of endpoint_private_access to true. thank you very much. – Niv Shitrit Apr 24 '23 at 20:21
  • I'll add it as an answer so you can select it as the correct one :D. – Mohammad Teimori Pabandi Apr 25 '23 at 11:47

1 Answers1

0

Consider adding vpc_config with vpc_config and endpoint_public_access set to true in your aws_eks_cluster resource. That should make it work since you're using private subnets.