15

I've got an EC2 launch configuration that builds the ECS optimized AMI. I've got an auto scaling group that ensures that I've got at least two available instances at all times. Finally, I've got a load balancer.

I'm trying to create an ECS service that distributes my tasks across the instances in the load balancer.

After reading the documentation for ECS load balancing, it's my understanding that my ASG should not automatically register my EC2 instances with the ELB, because ECS takes care of that. So, my ASG does not specify an ELB. Likewise, my ELB does not have any registered EC2 instances.

When I create my ECS service, I choose the ELB and also select the ecsServiceRole. After creating the service, I never see any instances available in the ECS Instances tab. The service also fails to start any tasks, with a very generic error of ...

service was unable to place a task because the resources could not be found.

I've been at this for about two days now and can't seem to figure out what configuration settings are not properly configured. Does anybody have any ideas as to what might be causing this to not work?

Update @ 06/25/2015:

I think this may have something to do with the ECS_CLUSTER user data setting.

In my EC2 auto scaling launch configuration, if I leave the user data input completely empty, the instances are created with an ECS_CLUSTER value of "default". When this happens, I see an automatically-created cluster, named "default". In this default cluster, I see the instances and can register tasks with the ELB like expected. My ELB health check (HTTP) passes once the tasks are registered with the ELB and all is good in the world.

But, if I change that ECS_CLUSTER setting to something custom I never see a cluster created with that name. If I manually create a cluster with that name, the instances never become visible within the cluster. I can't ever register tasks with the ELB in this scenario.

Any ideas?

Ryan
  • 7,733
  • 10
  • 61
  • 106
  • Just some random ideas to check: AZ/subnets of ELB and scaling group? ( are in the same? Can they access each other? How is the healthcheck works in the ELB? do you see any attached instance on the ELB details page? Do you have logs about the process on the ECS instance which registers the instance to the ELB? – Adam Ocsvari Jun 24 '15 at 23:32
  • Yea, everything is using the same VPC and subnet. The ELB health check is HTTP, which if ECS registers containers with my instances correctly, will work. I'm following the ECS load balancing documentation, which says to skip registering instances with the ELB, because ECS takes care of that. I think the issue is with the `ECS_CLUSTER` user data setting. If I leave it as default, I see an automatically created "default" cluster, in which I can see the instances and can register tasks. if I change it to something custom, I don't see a cluster being created, and can't register tasks. – Ryan Jun 25 '15 at 17:04

7 Answers7

14

I had similar symptoms but ended up finding the answer in the log files:

/var/log/ecs/ecs-agent.2016-04-06-03:

2016-04-06T03:05:26Z [ERROR] Error registering: AccessDeniedException: User: arn:aws:sts::<removed>:assumed-role/<removed>/<removed is not authorized to perform: ecs:RegisterContainerInstance on resource: arn:aws:ecs:us-west-2:<removed:cluster/MyCluster-PROD
    status code: 400, request id: <removed>

In my case, the resource existed but was not accessible. It sounds like OP is pointing at a resource that doesn't exist or isn't visible. Are your clusters and instances in the same region? The logs should confirm the details.

In response to other posts:

You do NOT need public IP addresses.

You do need: the ecsServiceRole or equivalent IAM role assigned to the EC2 instance in order to talk to the ECS service. You must also specify the ECS cluster and can be done via user data during instance launch or launch configuration definition, like so:

#!/bin/bash
echo ECS_CLUSTER=GenericSericeECSClusterPROD >> /etc/ecs/ecs.config

If you fail to do this on newly launched instances, you can do this after the instance has launched and then restart the service.

MrPaws
  • 171
  • 1
  • 3
  • Thanks for the log file locations... I had the same issue as described in this post even though i had the public-ip set up correctly. After examining the logs i could determine it was a policy issue - specifically the trust policy as documented in the below link: http://docs.aws.amazon.com/AmazonECS/latest/developerguide/instance_IAM_role.html – agp Apr 27 '16 at 19:52
  • 1
    Thanks. As some users point below, it's the `AmazonEC2ContainerServiceforEC2Role` policy that should be attached to the role you have (either typical ecsInstanceRole or any other of your own). In that case, just attaching the policy to your role should work without restarts or relaunches. – xmar Aug 13 '18 at 17:07
13

In the end, it ended up being that my EC2 instances were not being assigned public IP addresses. It appears ECS needs to be able to directly communicate with each EC2 instance, which would require each instance to have a public IP. I was not assigning my container instances public IP addresses because I thought I'd have them all behind a public load balancer, and each container instance would be private.

Ryan
  • 7,733
  • 10
  • 61
  • 106
  • 11
    Your instances should not need public IP addresses, but just need access to reach the ECS endpoints (this can mean either internet access through an IGW or NAT, or an HTTP proxy that allows access to ECS). – Samuel Karp Jan 27 '16 at 22:20
  • 2
    @SamuelK it seems that it does need a public IP: if I run the instance in a subnet with a IGW without public IP, it doesn't register. Same settings but with public IP and it registers. The security group doesn't need any incoming ports open, so it's safe anyways, with the added advantage of being able to SSH if we need to. – Jesús Carrera Jun 03 '16 at 14:50
  • 4
    You definitely do _not_ need public IP addresses for each of your private instances. The correct way to do this is setup a Nat Gateway and attach that gateway to the routing table that is attached to your private subnet. – Luke Peterson Jun 20 '17 at 23:58
  • @LukePeterson I'm trying to do exactly this. My EC2 instances have no issue with this set up. I'm new to ECS, but as soon as I remove the public IP from the launch config, the ECS EC2 instances no longer register with the cluster. This is despite being in a private subnet, with a NAT gateway, and external connectivity works fine from the box... (and the public IP isn't routable anyway.. but seems to still be required!) I'm sure there's something I'm missing, but I haven't found it yet ;) – Tim Malone Jun 04 '18 at 01:20
  • OK I found what I was missing - when I copied the launch configuration to remove the public IP, the IAM role also got removed. D'oh! – Tim Malone Jun 04 '18 at 01:43
  • @TimMalone I am exactly facing what u faced, trying to launch instance via ecs cluster without public ip, I chose my vpc which is configured with NAT. But still the instance created has public IP. How did you overcome this? How did you launch with only private IP is it possible? – Ashok Kumar Feb 26 '20 at 10:34
3

Another problem that might arise is not assigning a role with the proper policy to the Launch Configuration. My role didn't have the AmazonEC2ContainerServiceforEC2Role policy (or the permissions that it contains) as specified here.

Nick Spacek
  • 4,717
  • 4
  • 39
  • 42
3

You definitely do not need public IP addresses for each of your private instances. The correct (and safest) way to do this is setup a NAT Gateway and attach that gateway to the routing table that is attached to your private subnet.

This is documented in detail in the VPC documentation, specifically Scenario 2: VPC with Public and Private Subnets (NAT).

Luke Peterson
  • 8,584
  • 8
  • 45
  • 46
2

It might also be that the ECS agent creates a file in /var/lib/ecs/data that stores the cluster name.

If the agent first starts up with the cluster name of 'default', you'll need to delete this file and then restart the agent.

0

Check the route table for its outgoing connection to internet gateway. This is one important factor for ECS to communicate to auto scaling/EC2 and register them.

Balki
  • 1
  • 1
  • This does not provide an answer to the question. Once you have sufficient [reputation](https://stackoverflow.com/help/whats-reputation) you will be able to [comment on any post](https://stackoverflow.com/help/privileges/comment); instead, [provide answers that don't require clarification from the asker](https://meta.stackexchange.com/questions/214173/why-do-i-need-50-reputation-to-comment-what-can-i-do-instead). - [From Review](/review/late-answers/34594158) – user12256545 Jul 01 '23 at 13:22
-1

There where several layers of problems in our case. I will list them out so it might give you some idea of the issues to pursue.

My gaol was to have 1 ECS in 1 host. But ECS forces you to have 2 subnets under your VPC and each have 1 instance of docker host. I was trying to just have 1 docker host in 1 availability zone and could not get it to work.

Then the other issue was that the only one of the subnets had an attached internet facing gateway to it. So one of them was not accessible from public.

The end result was DNS was serving 2 IPs for my ELB. And one of the IPs would work and the other did not. So I was seeing random 404s when accessing the NLB using the public DNS.

David Dehghan
  • 22,159
  • 10
  • 107
  • 95