Handle sudden increase in traffic size (multiple orders of magnitude) with GKE

Question

If a website has a door crasher sale where many people (~50K) are waiting for the countdown to finish and enter the page, how would one tackle this with GKE in a cost efficient way?

That seems to be the reason GKE exists, the solution could be that with cluster autoscaler and HPA, GKE can handle the traffic. In practice however it is a different story, when the autoscaler tries to create nodes and pull the image for containers it may take up to a certain time (perhaps up to a min or two in some cases). During that time users see 5XX errors which is not ideal.

Well to tackle that, over-provisioning with paused pods come to mind. However, considering the servers are generally very small in size (they should only handle 100 requests in a normal day) and all of a sudden 50K in a second, how would this be a feasible solution? Paused pods seems to only make sure the autoscaler don't remove nodes that are not working, so in that case 50 nodes must always be occupied with paused pods which I am assuming the running hours are still billable (since nodes are there just not doing anything) in GKE.

What would a feasible solution to serve 100 requests with n1-standard-1 everyday but also be able to scale to ~50k in less than 10 seconds?

There is an official documentation about handling traffic by `GKE`: [Cloud.google.com: Best practises for running cost effective Kubernetes application on GKE](https://cloud.google.com/solutions/best-practices-for-running-cost-effective-kubernetes-applications-on-gke). Specifically the part: prepare cloud-based kubernetes applications. — Dawid Kruk, Sep 30 '20 at 13:02

Max Lobur · Accepted Answer · 2020-09-29T18:46:57.590

1

Not as fast as 10 seconds. That's reachable only if you go serverless.

Pods autoscaling best is 20-30 seconds (depends on your readiness probes, probes of loadbalancer, image cache etc). But you still have to have a pool of nodes to fit that capacity, which is the same money - you're right.

Nodes+Pods autoscaling is around 5 minutes.

If you go serverless, make sure you know (increase?) your account limits. Because it scales so fast and billed per lambda-run - it was very easy to accidentally blow up your bill. Thus all providers limited the default amount of concurrent function executions, e.g. AWS has 1000 per account by default. https://aws.amazon.com/about-aws/whats-new/2017/05/aws-lambda-raises-default-concurrent-execution-limit/. This can be increased through support.

I recall this post for AWS: https://aws.amazon.com/blogs/startups/from-0-to-100-k-in-seconds-instant-scale-with-aws-lambda/. Unfortunately didn't see similar writes for google functions, but I'm sure they have very similar capabilities.

edited Sep 29 '20 at 18:46

answered Sep 29 '20 at 18:35

Max Lobur

5,662
22
35

1

Max, thank you for your answer and suggestions (especially last link). The problem with serverless is all the configs that are no longer under your control. Also, even for Lambda they have provisioned concurrency which is essentially keeping some power idle to handle traffic. I am interested in a solution that uses GKE and somehow handles large (anticipated) loads. To be fair I will wait for a bit to see if anyone or you could think of a solution, then I will accept your answer (b/c honestly I don't know if such solution is available at this point.) – Greg Sep 30 '20 at 03:10
As a wild idea, would you recommend any semi/automatic scheduling of (which sounds redundant against autoscalar and the entire concept of KE) to tackle this problem? – Greg Sep 30 '20 at 03:19
Yes, lambdas have to be "warmed up" too, but still, this is way faster than 5 minutes. Also, the provisioned concurrency works based on a schedule, so you could try guessing your burst window and prepare: https://aws.amazon.com/blogs/compute/scheduling-aws-lambda-provisioned-concurrency-for-recurring-peak-usage/ – Max Lobur Sep 30 '20 at 10:44
Speaking of guessing the burst window: you could try this daemon https://github.com/hjacobs/kube-downscaler#command-line-options with `--upscale-period` flag in conjunction with cluster autoscaler https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-autoscaler . Cluster autoscaler has predicate called PodFitsResources which will notice that you asked for much more pods than cluster can fit, and scale up. Even without the actual load, number of nodes will go up just to run all the pods. Make sure you set resource requests and limits on a deployment in this case. – Max Lobur Sep 30 '20 at 10:52
k8s is the most robust and advanced scheduler today, if you still wanna go containers I'd stick to it for sure. Note that serverless is also not limited to PaaS products, there's https://www.openfaas.com/blog/introducing-faasd/ , but you still will have to deal with predictive/planned scaling of the underlying servers. – Max Lobur Sep 30 '20 at 11:00
Thank you Max. This would be my last comment, so plz bare with me. Do you think a simple CronJob that modifies the replica parameter in a deployment would be too hacky to worm up nodes? so the workflow would be that the main code sends the start and end time to a CronJob and a CronJob modifies the replica 5 min before the start time and 5 min after the end time. – Greg Sep 30 '20 at 17:36
Sure why not. I think `kube-downscaler` is the same as what you described. Except that schedule is set on a Deployment/HPA annotation, instead of being dictated by the main code. At least you can take k8s RBAC and part of the code from there if you want your own :) – Max Lobur Sep 30 '20 at 17:40
This is a good OSS candidate btw, I've been asked about `external autoscaler` already: https://stackoverflow.com/questions/63709968/how-to-supply-external-metrics-into-hpa/63711147#comment112682415_63711147 – Max Lobur Sep 30 '20 at 17:44
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/222323/discussion-between-greg-and-max-lobur). – Greg Sep 30 '20 at 18:37

Handle sudden increase in traffic size (multiple orders of magnitude) with GKE

1 Answers1