0

I'm using a Virtual Machine (16vCPU, 32GB Ram, 100GB disk size) on Compute Engine, with the specs mentioned below. As far I understand it, the machine has 8 cores with each the ability to run 2 threads at the same time - giving 16 threads in total.

What I am doing:

  • I am querying a docker service from a python client. The input is a PDF, the output a parsed file. The Docker service has a concurrency of 15.
  • I am running 15 threads in the ThreadPoolExecutor (I pasted the specific lines below)

The issue :

  • Out of 1800 requests that I made, only 970 of them - barely more than half - actually succeeded. The rest timed out with a 408 error.
  • I know other performance parameters could affect the timeout - but the machine is a fairly robust one - and the tasks run on my local machine which is much less powerful with less timeouts.

What I tried to fix it:

  • I've tried lowering the number of threads, but still getting a significant amount of timeout. I thought the bottleneck might be the Docker service - but given I don't have any enforced limits on the container - it should take up the resources available.

Any idea what might be the root cause for this issue in my setup ? How could I solve this ?

Machine Specs (lscpu)

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   46 bits physical, 48 bits virtual
CPU(s):                          8
On-line CPU(s) list:             0-7
Thread(s) per core:              2
Core(s) per socket:              4
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           79
Model name:                      Intel(R) Xeon(R) CPU @ 2.20GHz
Stepping:                        0
CPU MHz:                         2200.208
BogoMIPS:                        4400.41

Thread Pool (lines taken from workable script to illustrate)

 with concurrent.futures.ThreadPoolExecutor(max_workers=15) as executor:
        results = []
        for input_file in input_files:
            selected_process = self.process_pdf
            r = executor.submit(
                selected_process
                )
            results.append(r)
    for r in concurrent.futures.as_completed(results):
        input_file, status, text = r.result()
Matthieu
  • 316
  • 4
  • 14
  • Your statement is incorrect: **8 cores with each the ability to run 2 threads at the same time - giving 16 threads in total**. In the cloud you have vCPUs. Therefore only 8 threads. – John Hanley Aug 17 '22 at 22:46
  • 1
    You can also use this method to add a multiprocessing module to Python. I also discovered a thread for the multiprocessing library in Python here: https://stackoverflow.com/questions/25976350/timeout-for-each-thread-in-threadpool-in-python/25977928#25977928 Additionally, attempt to include a threadpool method timeout using the example below: timeout=1 max workers = 15, timeout = 1, concurrent.futures.ThreadPoolExecutor – Jeffrey D. Aug 18 '22 at 02:33
  • @JohnHanley As I understand it, the above intel processor implements hyperthreading which would allow 2 threads per core (as seen in the specs). Correct me if I am wrong - would be interested in knowing why – Matthieu Aug 18 '22 at 05:41
  • 1
    In the cloud, you get one vCPU. A vCPU is equivalent to a hyperthread. That is why you see the term `vCPU` and not `CPU Core` in the documentation, console GUI, etc. The details of processor virtualization are documented. Google search for more details. – John Hanley Aug 18 '22 at 05:45
  • In addition to @JohnHanley below answer you also refer to this documentation https://cloud.google.com/compute/docs/instances/set-threads-per-core , about "Set the number of threads per core" – Jeffrey D. Aug 18 '22 at 06:05
  • @JeffreyD. - Very good link – John Hanley Aug 18 '22 at 06:18
  • There is a point in which you pollute the UI and documentation mentioning every little detail over and over. – John Hanley Aug 18 '22 at 06:21

0 Answers0