How does horizontally autoscaling an application work on compute engine?

Question

I wrote a web application model with Flask and deployed it to a VM instance on Google compute engine. The backend is a machine learning model. I can access this application by the external IP. But if two or more users try to predict something using this app, it will crash. I think the solution is to horizontally scale the vm instances so more users can access.

I looked at google doc and the link below to get a general idea of this steps. Use existent VM Instace (bitnami) for Autoscale Group of Instances

But I am still confused about how this autoscale works.

The vm instance template does not contain the files and virtual environment, how to custom these information to template, or is it impossible?
If I deploy the app to one of the automatically generated vm instance, when more people use the app, the new generated instances will also be exactly the same (containing all the files of the web app) as the first one?

Alex G · Answer 1 · 2021-04-13T01:07:53.177

First, the main issue you are having is the crashing of your app when two or more users are using it. And the reason you come up with an Autoscale solution is because the root cause might be the VM does not have enough resources. I would recommend to try to test it first on a high resource VM and see if it is really a resource issue. Or more preferably, check the error logs on your application to see clues why it is crashing.

To answer your questions on Autoscaling

You will create your Instance template based from the disk image of your application, that information can also be found in the answers of the post on your question.
You will need to deploy the app to the instance template and it will automatically replicate itself based on the needed resources.

How do I configure managed instance group and autoscaling in Google Cloud Platform

Thank you. I tried with a custom image, the new instances did contain all the necessary files. — Jonas C, Apr 12 '21 at 14:24

How does horizontally autoscaling an application work on compute engine?

1 Answers1