3

It is recommended to use Python multi-threading only in IO-bound tasks because Python has a global interpreter lock (GIL) that only allows one thread to hold the control of the Python interpreter. However, Does multithreading make sense for IO-bound operations? says that, in general, multithreading in disk IO-bound tasks only makes sense if you are accessing more than one disk, given that the bottleneck is the disk.

Given that, if I have several tasks that access a database in a single local disk running simultaneously, is there any advantage in using multithreading, as the bottleneck will be the disk?

Does the answer change if the database is stored in a single remote disk? I guess that possibly yes, given that there is another variable which may be the bottleneck: the round-trip time between me and the server.

Alan Evangelista
  • 2,888
  • 6
  • 35
  • 45

1 Answers1

3

CPython and Pypy both have problems with threading CPU-bound tasks. Others, like Jython and IronPython do not.

Sometimes it makes sense to use multithreading or multiprocessing with I/O bound tasks, because a disk seek is an eon to the CPU, so if you can get some CPU work out of the way while you wait for a disk response, you've done a good thing.

If you write your code to have a tunable amount of parallelism, you can experimentally deduce a good number for your workload.

If you write your code to use the new concurrent.futures API, you can (mostly) easily flip between threads and processes using the similar:

  • concurrent.futures.ThreadPoolExecutor
  • concurrent.futures.ProcessPoolExecutor

This API is available in CPython 3.2 and up, as well as Tauthon 2.8.

Here's an example program: http://stromberg.dnsalias.org/~strombrg/coordinate/

HTH.

dstromberg
  • 6,954
  • 1
  • 26
  • 27
  • 1) Given that the problem of parallelizing CPU-bound tasks with threads in Python is caused by the GIL, how can Jython and IronPython avoid it? 2) If the disk seek time is much larger than the processing time in a task (i.e. it is an IO bound task) and there is only one local disk, it seems to me that the gain of adding several threads in order to get "some CPU work out of the way" will be despicable. Isn't that right? – Alan Evangelista Mar 20 '20 at 21:02
  • 2
    Jython and IronPython don't have a GIL. They inherit memory management from their underlying runtimes (e.g. JRE) which are not GIL-based. – Tim Richardson Aug 10 '21 at 01:48