0

We are hosting a sale every month. Once we are ready with all the deals data we send a notification to all of our users. As a result of that we get huge traffic with in seconds and it lasts for about an hour. Currently we are changing the instance class type to F4_1G before the sale time and back to F1 after one hour. Is there a better way to handle this?

imshayc
  • 48
  • 4
  • Automatically changing the instance type is not possible. Autoscaling was designed to handle traffic spikes. But you need to show at least your scaling config, give details about how you app behaves and what's not scaling (see for example https://stackoverflow.com/questions/47650500/concurrent-requests-handling-on-google-app-engine) and define what `better` means in your context (cost, performance, etc). As-is your question is too broad. – Dan Cornilescu Feb 15 '18 at 15:40

1 Answers1

0

A part from changing the instance class of App Engine Standard based on the expected demand that you have, you can (and should) also consider a good scaling approach for your application. App Engine Standard offers three different scaling types, which are documented in detail, but let me summarize their main features here:

  • Automatic scaling: based on request rate, latency in the responses and other application metrics. This is probably the best option for the use case you present, as more instances will be spun up in response to demand.
  • Manual scaling: continuously running, instances preserve state and you can configure the exact number of instances you want running. This can be useful if you already know how to handle your demand from previous occurrences of the spikes in usage.
  • Basic scaling: the number of instances scales with the volume demand, and you can set up the maximum number of instances that can be serving.

According to the use case you presented in your question, I think automatic scaling is the scaling type that matches your requirements better. So let me get a little more in-depth on the parameters that you can tune when using it:

  • Concurrent requests to be handled by each instance: set up the maximum number of concurrent requests that can be accepted by an instance before spinning up a new instance.
  • Idle instances available: how many idle (not serving traffic) instances should be ready to handle traffic. You can tune this parameter to be higher when you have the traffic spike, so that requests are handled in a short time without having to wait for an instance to be spun up. After the peak you can set it to a lower value to reduce the costs of the deployment.
  • Latency in the responses: the allowed time that a request can be left in the queue (when no instance can handle it) without starting a new instance.

If you play around with these configuration parameters, you can define in a very deterministic manner the amount of instances that you want to have, being able to accommodate the big spikes and later returning to lower values in order to decrease the usage and cost.

An additional note that you should take into account when using automatic scaling is that, after a traffic load spike, you may see more idle instances than you specified (they are not torn down in order to avoid that new instances must be started), but you will only be billed for the max_idle_instances that you specified.

dsesto
  • 7,864
  • 2
  • 33
  • 50