0

I have created and application. In this application I use multiprocessing library. In that application I do spin two processes (instances of the same class) to consume data from Kafka and put into Python Queue.

This is the library I used: Python multiprocessing

Q1. Is it concurrency or is it parallelism? 
Q2. Is it multithreading or is it multiprocessing? 
Q3. How does Python maps Processes to CPUs? (does this question make sense?)

I understand in order to speak about multithreading I need to use separate / multiple CPUs (so separate threads are mapped to separate CPU threads).

I understand in order to speak about multiprocessing I need to use separate memory space for both processes? Is it correct?

I assume if I spin two processes within one Application instance => we talk about concurrency.

If I spin multiple instances of above application then we would talk about parallelism? (multiple CPUs, separate memory spaces used)?

I see that Python library defines it as follows: Python multiprocessing library

The multiprocessing package offers both local and remote concurrency

...

Due to this, the multiprocessing module allows the programmer to fully leverage multiple processors on a given machine.

...

A prime example of this is the Pool object which offers a convenient means of parallelizing the execution of a function across multiple input values, distributing the input data across processes (data parallelism).
Dariusz Krynicki
  • 2,544
  • 1
  • 22
  • 47

2 Answers2

1

Q1: It is at least concurrency, can be parallelism as well (terms intended as defined in the answer to this question). Clearly, if you have one processor only, true parallellism cannot be achieved, becuse only one process can use the CPU at a single time. In that case, however, the muliprocessing library still allows you to define multiple tasks, that run in separate processes. It will be the OS's scheduler to decide which process runs when.

Q2: Multiprocessing (...which is kind of implied by the library name). Due to the Global Interpreter Lock present in most Python interpreter implementations, parallelism with threads is impossible. Multiprocessing offers a threading-like interface that makes use of processes under the hood.

Q3: It doesn't. Python spawns processes, the OS scheduler decided who runs where and when. There are some ways to execute processes on specific CPUs, but this is not the default behaviour of multiprocessing (and I'm not aware of any way to force the library to pin processes to CPUs).

GPhilo
  • 18,519
  • 9
  • 63
  • 89
1

First, separate threads are not mapped to separate CPU-s. That's optional, and in python due to the GIL, all threads in a process will run on the same CPU

1) It's both concurrency, in that the order of execution is not set, and parallelism, since the multiprocessing package can run on multiple processors, bypassing the GIL limitations.

2) Since the threading package is another story, then it's definitely multiprocessing

3) I may be speaking out of line, but python , IMO does NOT map processes to CPU-s, it leaves this detail to the OS

omu_negru
  • 4,642
  • 4
  • 27
  • 38
  • So, it would mean we are not sure if Python does multithreading. I guess in Java => JVM maps app threads (or JVM threads?) to OS threads? – Dariusz Krynicki Oct 24 '17 at 08:31
  • in JVM, java leaves the threads to the OS scheduler, as far as i know, but the JVM does not suffer from the GIL limitations – omu_negru Oct 24 '17 at 08:33
  • You are sure about python as long as you know which implementation you're using. cPython, the most common one, has the GIL problem and can't do effective multithreading – GPhilo Oct 24 '17 at 08:36
  • How can you be sure about Python if Python has not got anything within its architecture that maps application threads to CPU threads? If I have 4 threads available within my CPU only and I do spin 8 threads in Python I assume that Python application will do it and will spin those 8 threads but OS will map it to 4 CPU threads. So, it is not real multithreading, is it not? – Dariusz Krynicki Oct 24 '17 at 08:38
  • you're right of course, but i assumed that was implied. – omu_negru Oct 24 '17 at 08:38
  • what do you mean by having 4 threads available within your CPU? – omu_negru Oct 24 '17 at 08:39
  • @BlueTomato All application threads are mapped to **one** CPU thread. The python interpreter runs only in one thread (that's what the GIL entails), so when you spawn *python threads* you only get concurrency, because each thread will execute for some time, get pause and leave the CPU available for another thread. You do not get, however, parallelism. – GPhilo Oct 24 '17 at 08:43
  • @omu_negru: CPU thread - a single process (line of code currently being executed) in the CPU core. So, I imagine IF Python runs one thread => it maps to one CPU thread in CPU core. IF Python runs multiple threads and I have CPU with 4 cores (each core run one thread) => I could run up to 4 threads in Python BUT I believe there is no mapping guaranted between Python and CPU cores? [does it make sense what I wrote or am I wrong? please correct me if it does not make sense what I wrote] – Dariusz Krynicki Oct 24 '17 at 08:51
  • there is no guarantee that you'll be mapped to separate CPUs. That's the OS's job, since a python thread is a os thread basically. That does not matter, since the GIL will put a lock on all python functions, so your code will run like it's single threaded anyway (except for heavy IO code). Read about the gil [here](https://wiki.python.org/moin/GlobalInterpreterLock) – omu_negru Oct 24 '17 at 08:55