I wrote a web application model with Flask and deployed it to a VM instance on Google compute engine. The backend is a machine learning model. I can access this application by the external IP. But if two or more users try to predict something using this app, it will crash. I think the solution is to horizontally scale the vm instances so more users can access.
I looked at google doc and the link below to get a general idea of this steps. Use existent VM Instace (bitnami) for Autoscale Group of Instances
But I am still confused about how this autoscale works.
- The vm instance template does not contain the files and virtual environment, how to custom these information to template, or is it impossible?
- If I deploy the app to one of the automatically generated vm instance, when more people use the app, the new generated instances will also be exactly the same (containing all the files of the web app) as the first one?