You were speaking about Jenkins, so I'll answer with the Jenkins master service in mind, but my answer remains valid for any other case (even if it's not a good example for ECS, a Jenkins master doesn't scale correctly, so there can be only one instance).
503 bad gateway
I often encountered 503 gateway errors related to load balancer failing healthchecks (no healthy instance). Have a look at your load balancer monitoring tab to ensure that the count of healthy hosts is always above 0.
If you're doing an HTTP healthcheck, it must return a code 200 (the list of valid codes is configurable in the load balancer settings) only when your server is really up and running. Otherwise the load balancer could put at disposal instances that are not fully running yet.
If the issue is that you always get a 503 bad gateway, it may be because your instances take too long to answer (while the service is initializing), so ECS consider them as down and close them before their initialization is complete.
That's often the case on Jenkins first run.
To avoid that last problem, you can consider adapting your load balancer ping target (healthcheck target for a classic load balancer, listener for an application load balancer):
- With an application load balancer, try with something that will always return 200 (for Jenkins it may be a public file like /robots.txt for example).
- With a classic load balancer, use a TCP port test rather than a HTTP test. It will always succeed if you have opened the port correctly.
One node per instance
If you need to be sure you have only one node per instance, you may use a classic load balancer (it also behaves well with ECS). With classic load balancers, ECS ensures that only one instance runs per server.
That's also the only solution to have non HTTP ports accessible (for instance Jenkins needs 80, but also 50000 for the slaves).
However, as the ports are not dynamic with a classic load balancer, you have to do some port mapping, for example:
myloadbalancer.mydomain.com:80 (port 80 of the load balancer) -> instance:8081 (external port of your container) -> service:80 (internal port of your container).
And of course you need one load balancer per service.
Jenkins healthcheck
If that's really a Jenkins service that you want to launch, you should use the Jenkins Metrics plugin to obtain a good healthcheck URL.
Install it, and in the global options, generate a token and activate the ping, and you should be able to reach an URL looking like this: http://myjenkins.domain.com/metrics/mytoken12b3ad1/ping
This URL will answer the HTTP code 200 only when the server is fully running, which is important for the load balancer to activate it only when it's completely ready.
Logs
Finally, if you want to know what is happening to your instance and why it is failing, you can add logs to see what the container is saying in AWS Cloudwatch.
Just add this in the task definition (container conf):
Log configuration: awslogs
awslogs-group: mycompany (the Cloudwatch key that will regroup your container logs)
awslogs-region: us-east-1 (your cluster region)
awslogs-stream-prefix: myservice (a prefix to create the log name)
It will give you more insight about what is happening during a container initialization, if it just takes too long or if it is failing.
Hope it helps!!!