I'm using a Virtual Machine (16vCPU, 32GB Ram, 100GB disk size) on Compute Engine, with the specs mentioned below. As far I understand it, the machine has 8 cores with each the ability to run 2 threads at the same time - giving 16 threads in total.
What I am doing:
- I am querying a docker service from a python client. The input is a PDF, the output a parsed file. The Docker service has a concurrency of 15.
- I am running 15 threads in the ThreadPoolExecutor (I pasted the specific lines below)
The issue :
- Out of 1800 requests that I made, only 970 of them - barely more than half - actually succeeded. The rest timed out with a 408 error.
- I know other performance parameters could affect the timeout - but the machine is a fairly robust one - and the tasks run on my local machine which is much less powerful with less timeouts.
What I tried to fix it:
- I've tried lowering the number of threads, but still getting a significant amount of timeout. I thought the bottleneck might be the Docker service - but given I don't have any enforced limits on the container - it should take up the resources available.
Any idea what might be the root cause for this issue in my setup ? How could I solve this ?
Machine Specs (lscpu)
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 46 bits physical, 48 bits virtual
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 79
Model name: Intel(R) Xeon(R) CPU @ 2.20GHz
Stepping: 0
CPU MHz: 2200.208
BogoMIPS: 4400.41
Thread Pool (lines taken from workable script to illustrate)
with concurrent.futures.ThreadPoolExecutor(max_workers=15) as executor:
results = []
for input_file in input_files:
selected_process = self.process_pdf
r = executor.submit(
selected_process
)
results.append(r)
for r in concurrent.futures.as_completed(results):
input_file, status, text = r.result()