I have deployed some simple services as a proof of concept: an nginx web server patched with https://stackoverflow.com/a/8217856/735231 for high performance.
I also edited /etc/nginx/conf.d/default.conf
so that the line listen 80;
becomes listen 80 http2;
.
I am using the Locust distributed load-testing tool, with a class that swaps the requests
module for hyper
in order to test HTTP/2 workloads. This may not be optimal in terms of performance, but I can spawn many locust workers, so it's not a huge concern.
For testing, I spawned a cluster on GKE of 5 machines, 2 vCPU, 4GB RAM each, installed Helm and the charts of these services (I can post them on a gist later if useful).
I tested Locust with min_time=0 and max_time=0 so that it spawned as many requests as possible; with 10 workers against a single nginx instance.
With 10 workers, 140 "clients" total, I get ~2.1k requests per second (RPS).
10 workers, 260 clients: I get ~2.0k RPS
10 workers, 400 clients: ~2.0k RPS
Now, I try to scale horizontally: I spawn 5 nginx instances and get:
10 workers, 140 clients: ~2.1k RPS
10 workers, 280 clients: ~2.1k RPS
20 workers, 140 clients: ~1.7k RPS
20 workers, 280 clients: ~1.9k RPS
20 workers, 400 clients: ~1.9k RPS
The resouce usage is quite low as portrayed by kubectl top pod
(this is for 10 workers, 280 clients; nginx is not resource-limited, locust workers are limited to 1 CPU per pod):
user@cloudshell:~ (project)$ kubectl top pod
NAME CPU(cores) MEMORY(bytes)
h2test-nginx-cc4d4c69f-4j267 34m 68Mi
h2test-nginx-cc4d4c69f-4t6k7 27m 68Mi
h2test-nginx-cc4d4c69f-l942r 30m 69Mi
h2test-nginx-cc4d4c69f-mfxf8 32m 68Mi
h2test-nginx-cc4d4c69f-p2jgs 45m 68Mi
lt-master-5f495d866c-k9tw2 3m 26Mi
lt-worker-6d8d87d6f6-cjldn 524m 32Mi
lt-worker-6d8d87d6f6-hcchj 518m 33Mi
lt-worker-6d8d87d6f6-hnq7l 500m 33Mi
lt-worker-6d8d87d6f6-kf9lj 403m 33Mi
lt-worker-6d8d87d6f6-kh7wt 438m 33Mi
lt-worker-6d8d87d6f6-lvt6j 559m 33Mi
lt-worker-6d8d87d6f6-sxxxm 503m 34Mi
lt-worker-6d8d87d6f6-xhmbj 500m 33Mi
lt-worker-6d8d87d6f6-zbq9v 431m 32Mi
lt-worker-6d8d87d6f6-zr85c 480m 33Mi
I portrayed this test on GKE for easier replication, but I have come to the same results in a private-cloud cluster.
Why does it seem that it does not matter how many instances I spawn of a service?
UPDATE: As per the first answer, I'm updating information with information on the nodes and on what happens with a single Locust worker.
1 worker, 1 clients: 22 RPS
1 worker, 2 clients: 45 RPS
1 worker, 4 clients: 90 RPS
1 worker, 8 clients: 174 RPS
1 worker, 16 clients: 360 RPS
32 clients: 490 RPS
40 clients: 480 RPS (this seems over max. sustainable clients per worker)
But above all, it seems that the root problem is that I'm at the limit of capacity:
user@cloudshell:~ (project)$ kubectl top pod
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
gke-sc1-default-pool-cbbb35bb-0mk4 1903m 98% 695Mi 24%
gke-sc1-default-pool-cbbb35bb-9zgl 2017m 104% 727Mi 25%
gke-sc1-default-pool-cbbb35bb-b02k 1991m 103% 854Mi 30%
gke-sc1-default-pool-cbbb35bb-mmcs 2014m 104% 776Mi 27%
gke-sc1-default-pool-cbbb35bb-t6ch 1109m 57% 743Mi 26%