sharing read only objects in python with multiprocessing via forking, fails

Question

First I want to note that there are many posts on this topic. E.g, 1, 2, 3.

Here are a few that I've looked to for a solution, 4, 5

Here is the problem, I'm trying to improve the performance of some code using multiprocessing in python. The straightforward approach of running with the primary computation using pool.apply_async improved performance by ~50%, but this is inadequate. Profiling indicated a lot of overhead in forking the processes.

I thought that an implementation that creates a queue for jobs and a fixed number of processes (1/cpu) and a result queue would outperform this. However, its performance is slightly less than the apply_async solution. In this case, the overhead is in putting and getting from the queues. The data is moderate in size.

OK, I think, my data is static, i.e, I don't write to the data after the fork, so I'll make data that I'm pushing through a pipe/queue global! Searching, I think I find support for this idea in other posts, here and elsewhere (see above).

So, I write a simple example, see mp.py, mp_globals.py. It works great! Here is the output for a small case. It uses an mp.pool equal to the number of CPUs - 1.

Psuedo code

serve: there are N of these, they spin on job-que until they get a work-item dictionary, which they apply to a worker.
worker:  takes the work-item and does the work, and pushes the result to the result-que.

There is bookkeeping to keep track of how much work, etc.

The problem, I apply this code to the actual problem and it fails because some of the data in the forked context are no longer valid. One item, the args_dict is valid, but a list, another dict, and a class object are either empty or invalid. The example code works with all of these data types. I added prints that show the id(object) of these four items and only the args_dict has the same id value.

To be clear, the global objects are already invalid before any work is done on them. My PManager class works fine when data is being pushed through queues and when objects are global. Before running the workers the code assigns and checks the globals, and they are correct. On entry to the work-code the 3 of 4 values are bad, but the fourth is correct. The test code is only in two files, the project code is in many but the interaction is between three files -- this does seem like it should be a problem.

I tried using gc.freeze() and I get the same result.

All suggestions are welcome.

P.S., I did not try a MP.manager solution as comments I read and my understanding of how it works I think it is inappropriate to this problem.

I'm not exactly sure what you're trying to do with mp_globals. Whatever information is in mp_globals when you fork is shared among all the processes. But if any process makes a change to any of those variables, it's completely local to that thread. Is that the problem? To share (and modify) data between multiple processes is expensive and requires special data structures. — Frank Yellin, Sep 21 '21 at 21:21
https://stackoverflow.com/questions/6832554/multiprocessing-how-do-i-share-a-dict-among-multiple-processes — Frank Yellin, Sep 21 '21 at 21:24
Please do *not* post links to code. Provide examples as formatted text in the question itself. — juanpa.arrivillaga, Sep 21 '21 at 21:31
So, why aren't you using something like a process pool? What operating system are you using? — juanpa.arrivillaga, Sep 21 '21 at 21:41
"OK, I think, my data is static, i.e, I don't write to the data after the fork, so I'll make data that I'm pushing through a pipe/queue global! " But due to the way Python works, even if you are using a linux fork, you'll likely get COW to bring down the shared memory to a negligible amount pretty fast. — juanpa.arrivillaga, Sep 21 '21 at 21:44
@FrankYellin, as far as I know, there is no code writing these data structures. — eorojas, Sep 21 '21 at 22:52
@juanpa.arrivillaga, thanks, I did not know that was a rule. It felt like it was too much code to paste here. Yes, based on other threads I tried gc.freeze(), but that did not solve the problem. See [4] and [5]. — eorojas, Sep 21 '21 at 22:53
@eorojas you understand, they are actually modifying the CPython runtime to get that to work at all? And this is really only effective given the nature of their application. It could be completely useless given different use-cases. — juanpa.arrivillaga, Sep 21 '21 at 23:27
In any case, the background in your question is good, but you have to give us a *concrete* example of what is going wrong. a [mcve]. — juanpa.arrivillaga, Sep 21 '21 at 23:31
You also need to elaborate on exactly what you did with the Pool, and what youbr profiling results were that made you abandon the Pool approach. Honestly, if you aren't getting much of a speed-up with a pool there's probably little you can really accomplish, but it would require *details* anyway — juanpa.arrivillaga, Sep 21 '21 at 23:32
Thanks, @juanpa.arrivillaga, I was hoping for insight into why some of the global data structures were no longer valid in the new process. Or failing, that, how I might discover a way to debug it w/o debugging python directly. Or, why gc.freeze() doesn't help. — eorojas, Sep 22 '21 at 15:50
My tiny example works and in theory, it is doing the same thing. The Instagram discussion inspired trying gc.freeze() and implies that the size of the data doesn't matter when using that call, I assume my tiny example is not triggering the GC, so maybe if I try it with much larger data I can create an example that shows the problem.. I guess I'll give that a try. — eorojas, Sep 22 '21 at 15:53
`gc.freeze` doesn't have anything to do with any of this, as far as I can tell. — juanpa.arrivillaga, Sep 22 '21 at 16:14
For what it is worth, I added multiple instances of 1G element numpy arrays to my test and it still works fine -- though a bit slower:-) Well, guess that, as @juanpa.arrivillaga suggests gc.freeze() hos noting to do with since gc.freeze() is not being called in my test code. — eorojas, Sep 22 '21 at 17:15

sharing read only objects in python with multiprocessing via forking, fails

0 Answers0