AWS autoscaling starts not ready instances because of userdata script

Question

I have an autoscaling that works great, with a launchconfiguration where i defined a userdata script that is executed on a new instance launch.

The userscript updates basecode and generate cache, this takes some seconds. But as soon as the instance is "created" (and not "ready"), the autoscaling adds it to the load balancer.

It's a problem because while the userdata script is executed, the instance does not answer with a good response (basically, 500 errors are throw).

I would like to avoid that, of course I saw this documentation : http://docs.aws.amazon.com/AutoScaling/latest/DeveloperGuide/InstallingAdd

As with a standalone EC2 instance, you have the option of configuring instances launched into an Auto Scaling group using user data. For example, you can specify a configuration script using the User data field in the AWS Management Console, or the --userdata parameter in the AWS CLI.

If you have software that can't be installed using a configuration script, or if you need to modify software manually before Auto Scaling adds the instance to the group, add a lifecycle hook to your Auto Scaling group that notifies you when the Auto Scaling group launches an instance. This hook keeps the instance in the Pending:Wait state while you install and configure the additional software.

Looks like i'm not in this case. Also, modify the pending hook on the userdata script is complicated. There must be a simple solution to fix my problem.

Thank you for your help !

I'm not sure I understand why this is a problem. When the instance is "created" and Autoscale adds it to the ELB, the ELB will test the instance and the instance will need to pass your configured number of health checks BEFORE the ELB will make it available. You of course then need to make the health check such that it will fail while your userdata scripts are executing. — Brooks, Feb 10 '16 at 14:25
You mean that aws consider the instance "healthy" when the userdata is not completely executed ? — Nek, Feb 10 '16 at 14:26
EC2 considers the instance "healthy" once the VM is running (i.e. when the OS has booted up, the AMI has completely launched, etc.). It does not account for userdata commands being completed. For example, I will often SSH in to a new instance, tail the log file and watch the output of the userdata commands as they're executed. But, all that aside, the ELB will not consider the instance healthy and begin sending traffic to it until it passes the health checks which you configured when you launched the ELB. So, that's where you need to focus. — Brooks, Feb 10 '16 at 14:41
Thank you. I solved my problem by shutting down the http server at the start of the userdata script. So the loadbalancer can't have a green health status, and it does not send clients to the instance. I re-start the http server at the end of the script, the health is good so the ELB serve it. — Nek, Feb 10 '16 at 15:43

score 6 · Answer 1 · edited Apr 24 '23 at 19:09

6

EC2 instance Userdata does not utilize a lifecycle hook to stop a newly launched instance being brought into service until after it has finished executing.

Stopping your web server at the start of your user data script sounds a little unreliable to me, and therefore I would urge you to utilize the features AutoScaling provides that were designed to solve this very problem.

I have two suggestions:

Option 1:

Using lifecycle hooks isn't at all complicated, once you read through the docs. And in your user data, you can easily use the CLI to control the hook, check this out. In fact, a hook can be controlled from any supported language or scripting language.
Option 2:

If manually taking care of lifecycle hooks doesn't appeal to you, then I would recommend scrapping your user data script and doing a work around with AWS CodeDeploy. You could have CodeDeploy deploy nothing (eg. empty S3 folder) but you could use the deployment hook scripts to replace your user data script. Code Deploy integrates with AutoScaling seamlessly and handles lifecycle hooks automatically. A newly launched instance won't be brought into service by AutoScaling until a deployment has succeeded. Read the docs here and here for more info.

However, I would urge you to go with option 1. Lifecycle hooks were designed to solve the very problem you have. They're powerful, robust, awesome and free. Use them.

edited Apr 24 '23 at 19:09

Jzou

1,225
3
11
22

answered Feb 10 '16 at 15:54

mickzer

5,958
5
34
57

The lifecycle is complicated to use because you can't be sure to retrieve the good message from SQS for your current instance. There is probably a workaround like re-put in queue the message you're not concerned in the current instance, but what about concurrency? :/ All theses problems guides me to another solution which is simply using health status. – Nek Feb 10 '16 at 22:02
As I said in my answer, lifecycle hooks are not hard to use. Please read the docs and familiarize yourself with the core concepts. I have no clue what you are talking about regarding SQS, you haven't mentioned SQS anywhere else in this post. Maybe you need to elaborate on your particular use case? Use the tools already built and designed to solve your problem, don't reinvent the wheel. I urge you to follow the docs and try and test a solution using lifecycle hooks. – mickzer Feb 10 '16 at 22:11
I don't want to deploy with code deploy: we have a complexe deploy process and a deploy script already working well (with fabric and boto), also code deploy is not free so changing to a pay for solution that imply a double cost (i speak of dev cost) is not a real good solution. Also, I'm aware of the doc (as i said in first post), and i tried to work with lifecycle hooks, with SQS (topic subscribtion looks like a joke-solution as it redirects to SQS on its best solution). And here I am my problems: I don't see any easy way to deal with concurrency, SQS and lifecyclehook. – Nek Feb 11 '16 at 09:30
Code deploy for EC2 is free. You haven't mentioned concurrency or SQS in your question. You should update it with more information about your situation and describe exactly what you are trying to achieve. – mickzer Feb 11 '16 at 10:08

score 2 · Accepted Answer · answered Feb 10 '16 at 15:45

@Brooks said the easiest way to "wait" before the ELB serve the instance is to deal with ELB health status.

I solved my problem by shutting down the http server at the start of the userdata script. So the ELB can't have a green health status, and it does not send clients to the instance. I re-start the http server at the end of the script, the health is good so the ELB serve it.

AWS autoscaling starts not ready instances because of userdata script

2 Answers2

Linked