Shared memory in multiprocessing

Question

I have three large lists. First contains bitarrays (module bitarray 0.8.0) and the other two contain arrays of integers.

l1=[bitarray 1, bitarray 2, ... ,bitarray n]
l2=[array 1, array 2, ... , array n]
l3=[array 1, array 2, ... , array n]

These data structures take quite a bit of RAM (~16GB total).

If i start 12 sub-processes using:

multiprocessing.Process(target=someFunction, args=(l1,l2,l3))

Does this mean that l1, l2 and l3 will be copied for each sub-process or will the sub-processes share these lists? Or to be more direct, will I use 16GB or 192GB of RAM?

someFunction will read some values from these lists and then performs some calculations based on the values read. The results will be returned to the parent-process. The lists l1, l2 and l3 will not be modified by someFunction.

Therefore i would assume that the sub-processes do not need and would not copy these huge lists but would instead just share them with the parent. Meaning that the program would take 16GB of RAM (regardless of how many sub-processes i start) due to the copy-on-write approach under linux? Am i correct or am i missing something that would cause the lists to be copied?

EDIT: I am still confused, after reading a bit more on the subject. On the one hand Linux uses copy-on-write, which should mean that no data is copied. On the other hand, accessing the object will change its ref-count (i am still unsure why and what does that mean). Even so, will the entire object be copied?

For example if i define someFunction as follows:

def someFunction(list1, list2, list3):
    i=random.randint(0,99999)
    print list1[i], list2[i], list3[i]

Would using this function mean that l1, l2 and l3 will be copied entirely for each sub-process?

Is there a way to check for this?

EDIT2 After reading a bit more and monitoring total memory usage of the system while sub-processes are running, it seems that entire objects are indeed copied for each sub-process. And it seems to be because reference counting.

The reference counting for l1, l2 and l3 is actually unneeded in my program. This is because l1, l2 and l3 will be kept in memory (unchanged) until the parent-process exits. There is no need to free the memory used by these lists until then. In fact i know for sure that the reference count will remain above 0 (for these lists and every object in these lists) until the program exits.

So now the question becomes, how can i make sure that the objects will not be copied to each sub-process? Can i perhaps disable reference counting for these lists and each object in these lists?

EDIT3 Just an additional note. Sub-processes do not need to modify l1, l2 and l3 or any objects in these lists. The sub-processes only need to be able to reference some of these objects without causing the memory to be copied for each sub-process.

http://stackoverflow.com/questions/10721915/shared-memory-objects-in-python-multiprocessing Similar question and your answer. — sean, Jan 02 '13 at 15:39
Reaad trough it and still unsure about the answer. Will the entire object(s) be copied? Only a part of the object? Only page containing the refcount? How can i check? — FableBlaze, Jan 02 '13 at 17:57
Due to copy-on-write, I think you shouldn't have to do anything special. Why not just try it? — NPE, Jan 03 '13 at 08:27
Tried it and lists were copied. This seems to be because if if i do l1_0=l1[0] in a subprocess then this increases the reference counter of l1. So the although i haven't changed the data, i have changed the object and this causes the memory to be copied. — FableBlaze, Jan 03 '13 at 09:56
@anti666 thanks very much for this post / question. I think I'm running into some of the same issues with reference counting and the like. Have you tried a Numpy array, to at least reduce the objects for which references might be counted? Also, since you didn't mention your measurement method, make sure to use `smem`'s PSS stat; just looking at RSS doesn't show you anything useful, since it double-counts shared memory. — gatoatigrado, Jan 08 '14 at 22:41
I did not try Numpy arrays, because by that point we already decided to move to C++ which solved the problem for me. — FableBlaze, Feb 03 '14 at 09:33

score 92 · Answer 1 · answered Dec 09 '19 at 22:06

Because this is still a very high result on google and no one else has mentioned it yet, I thought I would mention the new possibility of 'true' shared memory which was introduced in python version 3.8.0: https://docs.python.org/3/library/multiprocessing.shared_memory.html

I have here included a small contrived example (tested on linux) where numpy arrays are used, which is likely a very common use case:

# one dimension of the 2d array which is shared
dim = 5000

import numpy as np
from multiprocessing import shared_memory, Process, Lock
from multiprocessing import cpu_count, current_process
import time

lock = Lock()

def add_one(shr_name):

    existing_shm = shared_memory.SharedMemory(name=shr_name)
    np_array = np.ndarray((dim, dim,), dtype=np.int64, buffer=existing_shm.buf)
    lock.acquire()
    np_array[:] = np_array[0] + 1
    lock.release()
    time.sleep(10) # pause, to see the memory usage in top
    print('added one')
    existing_shm.close()

def create_shared_block():

    a = np.ones(shape=(dim, dim), dtype=np.int64)  # Start with an existing NumPy array

    shm = shared_memory.SharedMemory(create=True, size=a.nbytes)
    # # Now create a NumPy array backed by shared memory
    np_array = np.ndarray(a.shape, dtype=np.int64, buffer=shm.buf)
    np_array[:] = a[:]  # Copy the original data into shared memory
    return shm, np_array

if current_process().name == "MainProcess":
    print("creating shared block")
    shr, np_array = create_shared_block()

    processes = []
    for i in range(cpu_count()):
        _process = Process(target=add_one, args=(shr.name,))
        processes.append(_process)
        _process.start()

    for _process in processes:
        _process.join()

    print("Final array")
    print(np_array[:10])
    print(np_array[10:])

    shr.close()
    shr.unlink()

Note that because of the 64 bit ints this code can take about 1gb of ram to run, so make sure that you won't freeze your system using it. ^_^

Dear @Rboreal_Frippery, thank you for your great answer. I was wondering whether there would be an alternative approach to ensure that the number of Processes generated do not surpass the number of cores in the CPU. Something like the multiprocessing.Pool object. If there is such an approach, how would one implement it using Processes? — Philipe Riskalla Leal, Jun 22 '21 at 14:04
@PhilipeRiskallaLeal processes don't inherently take up a whole core. You can have more processes than CPU cores... — KetZoomer, Jul 29 '21 at 22:31
Thanks for this great answer. Just wanted to link to a similar answer to this, which includes a memory tracing comparison: https://mingze-gao.com/posts/python-shared-memory-in-multiprocessing/ — ZaxR, Jul 30 '21 at 21:46
@Rboreal_Frippery I do not understand the difference between what you call "true" shared memory and the shared memory where that ugly bug exists, I thought the bug exists in the way of how multiprocessing.shared_memory is managed? Could you please explain? — jpp1, Jan 27 '23 at 08:38
this ugly bug: https://github.com/python/cpython/issues/82300 I could not believe that this, though present in Python since the introduction in version 3.8 and still in 3.11 is not even mentioned in the docs! — jpp1, Jan 28 '23 at 16:17
Hi @jpp1 ; At the time of writing this, the only way for multiprocessing to work on a large array in multiple processes at once involved the data being copied to every process, taking large amounts of memory. The bug you mention seems to involve a case where arbitrary processes are accessing shared memory outside of a multiprocessing queue / manager / etc. This is outside of the scope of any use case I've encountered and so is not addressed here. — Rboreal_Frippery, Jan 30 '23 at 14:48

rob · Accepted Answer · 2013-01-04T07:48:52.243

75

Generally speaking, there are two ways to share the same data:

Multithreading
Shared memory

Python's multithreading is not suitable for CPU-bound tasks (because of the GIL), so the usual solution in that case is to go on multiprocessing. However, with this solution you need to explicitly share the data, using multiprocessing.Value and multiprocessing.Array.

Note that usually sharing data between processes may not be the best choice, because of all the synchronization issues; an approach involving actors exchanging messages is usually seen as a better choice. See also Python documentation:

As mentioned above, when doing concurrent programming it is usually best to avoid using shared state as far as possible. This is particularly true when using multiple processes.

However, if you really do need to use some shared data then multiprocessing provides a couple of ways of doing so.

In your case, you need to wrap l1, l2 and l3 in some way understandable by multiprocessing (e.g. by using a multiprocessing.Array), and then pass them as parameters.
Note also that, as you said you do not need write access, then you should pass lock=False while creating the objects, or all access will be still serialized.

edited Jan 04 '13 at 07:48

answered Jan 03 '13 at 08:50

rob

36,896
2
55
65

Can i use `multiprocessing.Array` to wrap lists of arbitrary objects such as `bitarray()`? – FableBlaze Jan 03 '13 at 10:22
@anti666: I think you should use `multiprocessing.sharedctypes` - see http://docs.python.org/2/library/multiprocessing.html#module-multiprocessing.sharedctypes – rob Jan 03 '13 at 10:42
1

Alternatively, if bitarray supports the protocol buffer, you could share it as a bytearray, and then convert it back to a bitarray in the spawned processes. – rob Jan 03 '13 at 10:50
1

Decided to convert `l2` and `l3` into tuples of 'multiprocessing.Array' objects. Hoping that these objects (the largest part of the data) will not be entirely copied for each sub-process. This will alleviate the problem somewhat. Final solution will be rewriting the program in C as it will be faster and does not have this problem. – FableBlaze Jan 04 '13 at 20:01
2

Using shared memory, you should not have that problem at all, also in Python. – rob Jan 05 '13 at 13:23
From multiprocessing documentation: "Data can be stored in a shared memory map using `Value` or `Array`". Both `Value` and `Array` can contain only c_types. Perhaps there is another module that does not have that limitation. However C would not have that problem anyways as i can take advantage of copy-on-write. Additionally it will run faster. – FableBlaze Jan 06 '13 at 11:49
3

multiprocessing.Value and multiprocessing.Array force you to use raw C datatypes. They do make sure memory is shared, but that's not as simple as just using Linux's CoW behavior, which the question post is asking about. I have a hunch the asker's hypothesis that reference counts are wrecking it is correct. – gatoatigrado Jan 08 '14 at 22:44
In python actors approach step into the pickling limitations, for example, is not possible to pickle decorated functions. – geckos Apr 17 '19 at 15:16
@rob, would it be fine to have a shared data structure as a dict() where the key is string and value is counter. Each process identifies a string and updates the count in the shared dict() for respective string. I am confused, whether it would be possible that proc 2 & proc 1 will pick up the same value and update at the same time, this would be a problem and I won't have a correct count! Any solution on how to circumvent the problem? – Anu Oct 17 '19 at 23:31
"Python's multithreading is not suitable for CPU-bound tasks (because of the GIL)" THis is false. It's unfortunate that this was the accepted answer because it is false. Alas, the tribe has decreed that it shall be sacrificed into the volcano. – Geoffrey Anderson May 26 '20 at 13:55

Daniel · Answer 3 · 2022-11-23T10:00:35.617

17

For those interested in using Python3.8 's shared_memory module, it still has a bug (github issue link here) which hasn't been fixed and is affecting Python3.8/3.9/3.10 by now (2021-01-15). The bug affects posix systems and is about resource tracker destroys shared memory segments when other processes should still have valid access. So take care if you use it in your code.

edited Nov 23 '22 at 10:00

answered Jan 15 '21 at 05:35

Daniel

1,783
2
15
25

1

I experience this resource tracker destroying the shared memory. As a workaround, I stored the shared memory in a list. So this shared memory is being linked to a data structure, resource tracker can not destroy it. My python version is python3.8 – Nuran Feb 23 '21 at 02:35
1

The bug is still present as of 1/1/2022 but there seems to be a monkey-patch solution in the bug discussion for POSIX systems. For windows, I got rid of the bug by removing these lines (~line 152) from Lib\multiprocessing\shared_memory.py. Just make sure to correclty unlink() yourself (I use atexit.register(shm.unlink)) and you should be good. `finally: _winapi.CloseHandle(h_map)` – Jasmin Parent Jan 01 '22 at 22:31

score 11 · Answer 4 · edited Jun 20 '20 at 09:12

If you want to make use of copy-on-write feature and your data is static(unchanged in child processes) - you should make python don't mess with memory blocks where your data lies. You can easily do this by using C or C++ structures (stl for instance) as containers and provide your own python wrappers that will use pointers to data memory (or possibly copy data mem) when python-level object will be created if any at all. All this can be done very easy with almost python simplicity and syntax with cython.

# pseudo cython
cdef class FooContainer:
   cdef char * data
   def __cinit__(self, char * foo_value):
       self.data = malloc(1024, sizeof(char))
       memcpy(self.data, foo_value, min(1024, len(foo_value)))
   
   def get(self):
       return self.data

# python part
from foo import FooContainer

f = FooContainer("hello world")
pid = fork()
if not pid:
   f.get() # this call will read same memory page to where
           # parent process wrote 1024 chars of self.data
           # and cython will automatically create a new python string
           # object from it and return to caller

The above pseudo-code is badly written. Dont use it. In place of self.data should be C or C++ container in your case.

Can you provide a working example? – Andrew Eckart Aug 24 '22 at 21:41 — Andrew Eckart, Aug 24 '22 at 21:41

score 3 · Answer 5 · answered Feb 01 '18 at 11:58

3

You can use memcached or redis and set each as a key value pair {'l1'...

answered Feb 01 '18 at 11:58

CrabbyPete

505
9
18

redis is blocking I think. so if the need is for multiple readers accessing the shared structure, then mp.Array/mp.Value might be a better solution. it all depends on the application – Cryptoharf84 Sep 13 '19 at 15:48

Shared memory in multiprocessing

5 Answers5

Linked

Related