I have an executable that supports multithreading and I'm trying to execute it on Google Cloud. I've reserved 8 VCPUs and I'm executing the job using 8 threads. Let's say that I get an execution time of y. Now I've reserved 16 VCPUs, but am only executing the job using 8 threads, and I get an execution time of x. What I'm noticing is that x is almost 15-20% less than y. Why do I get this performance benefit when I reserve more VCPUs, but use less threads?
Any help will be appreciated. Thanks.