How do I configure managed instance group and autoscaling in Google Cloud Platform

Question

Autoscaling helps you to automatically add or remove compute engines based on the load. The prerequisites to autoscaling in GCP are instance template and managed instance group.

This question is a part of another question's answer, which is about building an autoscaled and load-balanced backend.

I have written the below answer that contains the steps to set up autoscaling in GCP.

Just reading through the meta about these questions. I agree with your approach - to create separate questions, and invite potentially high quality answers from other users on specific issues. However, perhaps you could phrase these questions to be more like questions? E.g. "How do I configure managed instance group and autoscaling in the Google Cloud platform?". You could even include a little of the context from the original question. This might make the question more searchable, and would allow people to answer it without having to know the original context. — Graham Harper, Jan 12 '17 at 16:51
@graham I have changed the question as per your suggestion. Do you want me to remove the reference "This question is a part of another question's answer" from the question ? — Lakshman Diwaakar, Jan 13 '17 at 00:33

score 21 · Accepted Answer · edited Dec 17 '18 at 20:51

Autoscaling is a feature of managed instance group in GCP. This helps to handle very high traffic by scaling up the instances and at the same time it also scales down the instances when there is no traffic, which saves a lot of money.

To set up autoscaling, we need the following:

Instance template
Managed Instance group
Autoscaling policy
Health Check

Instance template is a blueprint that defines the machine-type, image, disks of the homogeneous instances that will be running in the autoscaled, managed instance group. I have written the steps for setting up an instance template here.

Managed instance group helps in keeping a group of homogeneous instances that is based on a single instance template. Assuming the instance template as sample-template. This can be set up by running the following command in gcloud:

gcloud compute instance-groups managed \
create autoscale-managed-instance-group \
--base-instance-name autoscaled-instance \
--size 3 \
--template sample-template \
--region asia-northeast1

The above command creates a managed instance group containing 3 compute engines located in three different zones in asia-northeast1 region, based on the sample-template.

base-instance-name will be the base name for all the automatically created instances. In addition to the base name, every instance name will be appended by a uniquely generated random string.
size represents the desired number of instance in the group. As of now, 3 instances will be running all the time, irrespective of the amount of traffic generated by the application. Later, it can be autoscaled by applying a policy to this group.
region (multi-zone) or single-zone: Managed instance group can be either set up in a region (multi-zone) i.e the homogeneous instances will be evenly distributed across all the zones in a given region or all the instances can be deployed in the same zone within a region. It can also be deployed as cross region one, which is currently in alpha.

Autoscaling policy determines the autoscaler behaviour. The autoscaler aggregates data from the instances and compares it with the desired capacity as specified in the policy and determines the action to be taken. There are many auto-scaling policies like:

Average CPU Utilization
HTTP load balancing serving capacity (requests / second)
Stackdriver standard and custom metrics
and many more

Now, Introducing Autoscaling to this managed instance group by running the following command in gcloud:

gcloud compute instance-groups managed \
set-autoscaling \
autoscale-managed-instance-group \
--max-num-replicas 6 \
--min-num-replicas 2 \
--target-cpu-utilization 0.60 \
--cool-down-period 120 \
--region asia-northeast1

The above command sets up an autoscaler based on CPU utilization ranging from 2 (in case of no traffic) to 6 (in case of heavy traffic).

cool-down-period flag specifies the number of seconds to wait after a instance has been started before the associated autoscaler starts to collect information from it.
An autoscaler can be associated to an maximum of 5 different policies. In case of more than one policy, Autoscaler recommends the policy that leaves with the maximum number of instances.
Interesting fact: when an instance is spun up by the autoscaler, it makes sure that the instance runs for atleast 10 minutes irrespective of the traffic. This is done because GCP bills for a minimum of ten minute running time for the compute engine. It also protects against erratic spinning up and shutting down of instances.

Best Practices: From my perspective, it is better to create a custom image with all your software installed than to use a startup script. As the time taken to launch new instances in the autoscaling group should be as minimum as possible. This will increase the speed at which you scale your web app.

This is part 2 of 3-part series about building an autoscaled and load-balanced backend.

What's the difference between setting the CPU utilization on the instance group vs setting it on the command to add a backend or a backend service? They both support the option, which seems redundant and confusing. — odigity, May 09 '17 at 21:03
>It can also be deployed as cross region one, which is currently in alpha Could you expand on this? I would like to use this but it seems that there is no such feature present. — Tadas Šubonis, Aug 01 '18 at 16:21

How do I configure managed instance group and autoscaling in Google Cloud Platform

1 Answers1

Linked