I have implemented a client application that sends requests to a server. The way it works can be described very simply. I specify a number of threads. Each of this threads repeatedly sends requests to a server and waits for the answer.
I have plotted the total throughput of the client, for a various number of threads. The number of virtual clients is not important, I am interested by the maximal, saturated performance, at the very right of the graph.
I am surprised because I did not expect the performance to scale with the number of threads. Indeed, most of the processor time is spent in blocking i/o in Java (blocking sockets), as the client-server communication has a 1ms latency, and the client is running on a 8 core machine.
I have looked for solutions online, this answer on Quora seems to imply that the waiting time for blocking i/o can be scheduled to use for other tasks. Is is true, specifically for Java blocking sockets ? In that case, why don't I get linear scaling with the number of threads ?
If this important, I am running this application in the cloud. Also, this is part of a larger application, but I have identified this component as the bottleneck of the whole setup.