0

Imagine that a parallel computation is to be performed. Let us imagine that due to RAM requirements it is essential that memory is shared across sub-processes. Let us also assume that no shared memory is ever altered, but only read from. My question is if it can ever matter (in terms of performance) how the shared memory is referenced.

As an example consider the following (and let's assume we are on a system that uses Copy on Write such that a is not copied to each sub-process)

from multiprocessing import Pool

a = [5, 8, 0.1, 154]

def function( i ):
    return a[i]**2

with Pool(4) as p:
    result = p.map( function, [0,1,2,3] )

print(result)

out: [25, 64, 0.010000000000000002, 23716]

Will the four values of a (index 0,1,2,3) be accessible at the same time? Or will each sub process have to wait until the reference to the list a is "available" before being able to extract a[i]?

I suspect that the answer is that separate elements are available at the same time. But I want to be sure.

Thanks for any input :)

Hamada
  • 1,836
  • 3
  • 13
  • 27
Axiomel
  • 175
  • 8
  • 1
    Multiprocessing like this will make copies of the data in each process, there is no "shared list" here really. If you want to actually shared data between processes see [Sharing state between processes](https://docs.python.org/3/library/multiprocessing.html#sharing-state-between-processes). – jdehesa Jun 22 '20 at 11:53
  • Hi jdehesa. Thanks for the comment :)! Are you sure that is true for a modern Linux machine? As I understand, the forks will only copy data on write. And this seems to be different when not on Linux. c.f The anwser of Francis Avila here: https://stackoverflow.com/questions/10721915/shared-memory-objects-in-multiprocessing – Axiomel Jun 22 '20 at 11:59
  • @jdehesa I'm also interested to know what you mean. AFAIU when `start_method` is set to `fork` then `a` will be available in the address space of all child processes. – Sam Mason Jun 22 '20 at 12:05
  • 1
    Ah I see, that's what you meant by that, right. Hm, I think it may depend on the [start method](https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods), if you use _fork_ (default on Unix) it may be the case. But it's hard to tell, I don't know if accessing a list element may increase some reference counter, or if some part of the `multiprocessing` code (or even just the function calls) may change something in the heap of the interpreter, and since CoW works at page-level is difficult to guarantee... – jdehesa Jun 22 '20 at 12:10
  • @jdehesa Let us assume ```a``` it is never copied for the sake of argument. What is your opinion :)? Can all elements be accessed simultaneous? – Axiomel Jun 22 '20 at 12:25
  • 1
    @Axiomel I'd suggest looking into what "copy-on-write" means in a unix system. in this case, any reference to the variable `a` (i.e. in `function`) will cause its (Python) reference counter to be incremented. this will cause (at least) the page backing the part of the object that contains the reference counter to be copied (by the kernel). any other "shared" variables that are referenced will also trigger the same behaviour. – Sam Mason Jun 22 '20 at 12:38
  • 1
    @Axiomel If the memory would really not be copied between processes, then it would be immediately available to the subprocesses. If you want to get very specific, there may be a slight difference on first access due to the lower-level caches if the subprocess runs in a different processor, but otherwise everything should be transparent. – jdehesa Jun 22 '20 at 12:58

0 Answers0