18

I am writing a django app which I plan on deploying to AWS via Elastic Beanstalk. I am trying to understand why I would need to specify 'leader_only' for a container command I want to run for my app. More details about this can be found here.

It says:

Additionally, you can use leader_only. One instance is chosen to be the leader in an Auto Scaling group. If the leader_only value is set to true, the command runs only on the instance that is marked as the leader.

If I have several instances running my app because I want to scale it, wouldn't using 'leader_only' run the command on only one instance, and not affect the rest? I am probably misunderstanding the purpose of it, but that seems non-ideal because the environment in the leader may differ from the other instances, and the end user may get different results depending on which instance they happen to connect to.

capcom
  • 525
  • 7
  • 16
  • As stated below, but to be concise, it's used when you only need to run something once against the stack. Example: running database migrations. – Ryan Fisher Jun 03 '17 at 08:43

2 Answers2

30

From a technical point of view, elastic beanstalk is autoscaling group and when you deploy something you need to assume that potentially your commands can be executed simultaneously on several ec2 instances.

Main goal of the leader_only option is to make sure that your commands will be executed on only one ec2 instance. It is useful for use cases such as execution of the db migration scripts, creation of db, etc., that should be executed just once on one ec2. So leader_only is just a marker that some commands will be executed on this instance only.

However, you need to keep in mind, the leader attribute is set once on creation of your environment and in case if leader died and was replaced by new instance possible situation when you don't have any leaders in autoscaling group.

djvg
  • 11,722
  • 5
  • 72
  • 103
Vadym Fedorov
  • 2,395
  • 2
  • 21
  • 31
  • I have read varying posts on SO spanning over a few years that AWS now retains leader_only properly when scaling up and down, any idea? – digout Dec 10 '18 at 11:56
0

I've done considerable testing of this recently. Both leader_only and EB_IS_COMMAND_LEADER. Both Apache 1 and Apache 2 setups.

The two named values above can be found in many discussions, guides and documents, but the situation is basically this:

You cannot trust being able to reliably detect a leader in a multiple EC2 instance environment, except during deployment and scale up

That means you cannot use the testing of either of the values above to confirm a command will run on exactly one (not zero, not 2+) instance as part of a cron job or scheduled task.

Recent improvements and changes to the way leader status is managed may well mean that a leader is always available during deployments and scale up, but at other times, including after instance replacement, there may not be a leader instance to be found.

There are two main options available if you really need to only run a scheduled task once while managing multiple instances.

  1. A worker environment specifically for scheduled tasks, or another external service like Lambda with EventBridge (CloudWatch Events)

  2. Setup crons to run across all instance in deployment configs. Include a small amount of code before the cron runs which connects to the AWS api, gets a list of current instances and checks the id of the first returned against its own ID to see if it should run the cron.

Harry B
  • 2,864
  • 1
  • 24
  • 44