21

So this is more or less a theoretical question. I have a single core machine which is supposedly powerful but nevertheless only one core. Now I have two choices to make :

  1. Multithreading: As far as my knowledge is concerned I cannot make use of multiple cores in my machines even if I had them because of GIL. Hence in this situation, it does not make any difference.

  2. Multiprocessing: This is where I have a doubt. Can I do multiprocessing on a single core machine? Or everytime I have to check the cores available in my machine and then run exactly the same or less number of processes?

Can someone please guide me on the relation between multiprocessing and cores in a machine.

I know this is a theoretical question but my concepts are not very clear on this.

Santosh Kumar
  • 26,475
  • 20
  • 67
  • 118
Subhayan Bhattacharya
  • 5,407
  • 7
  • 42
  • 60
  • You can answer that for forself: check the number of programs running on your system and compare that with number of cores. – Klaus D. Sep 23 '18 at 10:56

3 Answers3

14

This is a big topic but here are some pointers.

  • Think of threads as processes that share the same address space and can access the same memory. Communication is done by shared variables. Multiple threads can run within the same process.
  • Processes (in this context, and roughly speaking) have their own private data and if two processes want to communicate that communication has to be done more explicitly.
  • When you are writing a program where the bottleneck is CPU cycles, neither threads or processes will give you a speedup on a single core machine.
  • Processes and threads are still useful for multitasking (rapid switching between (sub)programs) - this is what your operating system does because it runs far more processes than you have cores.
  • Processes and threads (or even coroutines!) can give you considerable speedup even on a single core machine if the tasks you are executing are I/O bound - think of fetching data from a network. For example, instead of actively waiting for data to be sent or to arrive, another process or thread can initiate the next network operation.
  • Threads are preferable over processes when you don't need explicit encapsulation due to their lower overhead. For most CPU-bound concurrent problems, and especially the large subset of "embarassingly parallel" ones, it does not make much sense to spawn more processes than you have processors.
  • The Python GIL prevents two threads in the same process from running in parallel, i.e. from multiple cores executing instructions literally at the same time.
  • Therefore threads in Python are relatively useless for speeding up CPU-bound tasks, but can still be very useful for I/O bound tasks, because blocking operations (e.g. waiting for network data) release the GIL such that another thread can run while the other waits.
  • If you have multiple processors, you can have true parallelism by spawning multiple processes despite the GIL. This is only worth it for CPU bound tasks, and often you have to consider the overhead of spawning processes and the communication cost between processes.
timgeb
  • 76,762
  • 20
  • 123
  • 145
  • Probably i was not able to get my question across clearly enough. Since i have one core and i spin up 3 computationally intensive process, the OS would have to do context switch between them. So with one core running 3 process in parallel can in no way be achieved. Kindly let me know if my understanding is correct – Subhayan Bhattacharya Sep 23 '18 at 15:15
  • 1
    @user1867151 you understood correctly. Concurreny is about dealing with lots of things at once and can be achieved by context switching, parallelism on the other side is about doing lots of things at exactly the same time. For true parellelism, you need more than one processor. – timgeb Sep 23 '18 at 15:31
2

You CAN use both multithreading and multiprocessing in single core systems.

The GIL limits the usefulness of multithreading in pure Python for computation-bound tasks, no matter your underlying architecture. For I/O-bound tasks, they do work perfectly fine. Had they had not any use, they would not have been implemented in the first place, probably.

For pure Python software, multiprocessing is always a safer choice when it comes to parallel computing. Of course, multiple processes are more expensive than multiple threads (since processes do not share memory, contrarily to threads; also, processes come with slightly higher overhead compared to threads).

For single processor machines, however, multiprocessing (and multithreading) buys you little to no extra speed for computationally heavy tasks, and they should actually even slow you down a bit. But, if the OS supports them (which is pretty common for desktop, workstation, clusters, etc, but may not be common for embedded systems), they allow you to effectively run simultaneously multiple I/O-bound programs.

Long story short, it depends a bit on what you are doing...

norok2
  • 25,683
  • 4
  • 73
  • 99
1

multiprocessing module basically spawns up multiple instances of python interpreter, so there is no worry of GIL.

multiprocessing uses the same API used by threading module if you have used it previously.


You seem to be confused between multiprocessing, threading (you referring as multithreading) and X-core processor.

  • No matter what, when you start Python (CPython implementation) it will only use one core of your processor.
  • Threading is distributing the load between the different component of the script. Suppose you have to interact with an external API, your script has to wait for communication to finish until it proceeds next. You have are making multiple similar calls, it will take linear time. Whereas if you use threading, you can do those calls parallelly.

See also: PyPy implementation of Python

Santosh Kumar
  • 26,475
  • 20
  • 67
  • 118