10

I'm brand new to AWS and I have a script which I believe should create an ECS cluster.

When I run the script, my stack hangs in the CREATE_IN_PROGRESS state for over an hour. Eventually, it fails and goes into ROLLBACK_COMPLETE.

When I'm in Cloudformation in the AWS console, I can go to "Events" and see that two Services which I'm trying to create are causing stack creation to fail. However, the only error message is Resource creation timed out waiting for completion.

I've tried the steps outlined here, namely, including going in to CloudTrail, but I'm not really sure what to look for and haven't found anything to help me resolve the issue. Again, I'm an AWS noob.

What are some steps I can take to get a more detailed error message? How do I go about debugging in AWS?

Any help is appreciated, let me know if I need to provide more info.

ellen
  • 571
  • 7
  • 23
  • What's your template? Which services fail? – Marcin Apr 04 '21 at 04:36
  • 3
    It is likely that your issues are ECS services not being able to stabilize (due to healthcheck failures etc). Probably the best way to debug this is to to on the ECS console and check the tasks and the reasons why they are being shutdown. – mreferre Apr 04 '21 at 16:49
  • Not sure how to fix either, but found more clarity about the error on this aws page: https://aws.amazon.com/premiumsupport/knowledge-center/cloudformation-physical-resource-error/ – stevestar888 Nov 22 '21 at 16:19

1 Answers1

6

I was running into the same situation with CDK where my ECS would fail after 3 hours of CREATE_IN_PROGRESS. A big issue with debugging and troubleshooting is when ROLLBACK happened it wipes your ECS cluster and the event history. However, if you go to the ECS console's Task list you should see a task and I bet you it's stuck in a PENDING state. There are a lot of reasons for this. When the Task fails to reach the desired state it'll add the reason it failed to the Service's Events. To get there:

Cluster > Service > Service Name

On this page there's an Events tab

Service Event's tab

Select a Task and it show that it STOPPED. In my case below it looks like it couldn't find the ECS container template image

CannotPullContainerError

axecopfire
  • 512
  • 5
  • 16
  • 2
    Had that error about ECS service being unable to pull image. See https://aws.amazon.com/premiumsupport/knowledge-center/ecs-unable-to-pull-secrets/. Problem was that the ECS task was not getting auto-assigned a public IP so that it could access AWS ECR (search for "Be sure to enable Auto-assign public when you launch a new task or create a new service". Fixed by adding flag to Fargate ALB service configs, see below CDK code: const service = new ecs_patterns.ApplicationLoadBalancedFargateService(this, 'AcmeFulfillmentService', { assignPublicIp: true, certificate: 'certArn' – Jose Quijada Aug 26 '22 at 22:24