2

Hope you can help me with this!

What is the best approach to get and set request and limits resource per pods?

I was thinking in setting an expected number of traffic and code some load tests, then start a single pod with some "low limits" and run load test until OOMed, then tune again (something like overclocking) memory until finding a bottleneck, then attack CPU until everything is "stable" and so on. Then i would use that "limit" as a "request value" and would use double of "request values" as "limit" (or a safe value based on results). Finally scale them out for the average of traffic (fixed number of pods) and set autoscale pods rules for peak production values.

Is this a good approach? What tools and metrics do you recommend? I'm using prometheus-operator for monitoring and vegeta for load testing.

What about vertical pod autoscaling? have you used it? is it production ready?

BTW: I'm using AWS managed solution deployed w/ terraform module

Thanks for reading

2 Answers2

2

I usually start my pods with no limits nor resources set. Then I leave them running for a bit under normal load to collect metrics on resource consumption.

I then set memory and CPU requests to +10% of the max consumption I got in the test period and limits to +25% of the requests.

This is just an example strategy, as there is no one size fits all approach for this.

whites11
  • 12,008
  • 3
  • 36
  • 53
  • To clarify when you say +10% do you mean 10% over the max consumption or 10% of the max consumption? – Almenon Feb 01 '21 at 19:50
  • 1
    10% over the max, so if the max usage is 100Mb I would set 110Mb as request for example. – whites11 Feb 02 '21 at 20:04
  • Oh, interesting. Why do you set the request over the max? The advice in https://stackoverflow.com/a/56981709/6629672 and my boss suggest doing the average. – Almenon Feb 02 '21 at 23:41
  • 2
    As I mentioned it is just one possibile strategy that might not work for you. My reasoning was that during the initial test period you might not catch all memory needs for your app, so giving it some more space in the request limits the chance to have nodes with memory pressure. Using average sounds like a good strategy as well – whites11 Feb 04 '21 at 05:38
1

The VerticalPodAutoScaler is more about making sure that a Pod can run. So it starts it low and doubles memory each time it gets OOMKilled. This can potentially lead to a Pod hogging resource. It is also limited as it doesn't take account of under-performance. If your app is under-resourced it might still respond but not respond in a timeframe you consider acceptable.

I think you are taking a good approach as you are looking at the application under load and assessing what it needs to perform as you want it to. I doubt I can suggest any tools you aren't already aware of but if it helps there is some more discussion in How to set the right cpu millicores for a container? and the threads that link from it

Ryan Dawson
  • 11,832
  • 5
  • 38
  • 61