1

I am currently trying to parallize a rather large task of computing a complex system of differential equations. I want to parallize the computation, so each computation has its own process. I need the results to be ordered, therefore I am using a dictionary to order it after the process. I am also on Windows 10.

For now I am only running the identity function to check the code, but even then it simply runs all logical cores at 100% but does not compute (I waited 5 minutes). Later on I will need to initalize each process with a bunch of variables to compute the actual system defined in a solver() function further up the code. What is going wrong?

import multiprocessing as mp
import numpy as np

Nmin = 0
Nmax = 20
periods = np.linspace(Nmin, Nmax, 2*Nmax +1) # 0.5 steps

results = dict()

def identity(a):
    return a

with mp.Manager() as manager:
    sharedresults = manager.dict()

    with mp.Pool() as pool:
        print("pools are active")
        for result in pool.map(identity, periods): 
            #sharedresults[per] = res
            print(result)

orderedResult = []
for k,v in sorted(results.items()):
    oderedResult.append(v)

The program gets to the "pools are active" message and after printing it, it just does nothing I guess?

I am also using Jupyterlab, not sure wether that is an issue.

mkrieger1
  • 19,194
  • 5
  • 54
  • 65
derdotte
  • 81
  • 7
  • change `pool.map` to `pool.imap`. – Ahmed AEK Dec 01 '22 at 18:48
  • @AhmedAEK that also has the same problem – derdotte Dec 01 '22 at 18:51
  • For me, when using IDLE it already blocks when calling `mp.Manager()`. When starting Python from the command prompt directly, it doesn't block. Maybe try that. Then it might be an issue with using Jupyterlab. – mkrieger1 Dec 01 '22 at 19:25
  • When running the code from the command prompt it raises an exception about not using `if __name__ == '__main__'`. When moving everything from `with mp.Manager() ...` to the end inside an `if __name__ == '__main__'` block, the code seems to run without issue. – mkrieger1 Dec 01 '22 at 19:29
  • You aren't useing an `if __name__ == "__main__"` guard! Have you read the documentation? This is crucial, particularly on windows – juanpa.arrivillaga Dec 01 '22 at 20:04
  • So, you don't have to go to an "actual IDE", or rather, *unix is your IDE*. Or, I guess, Windows. Learn to use the terminal and various tools. Or just use VSCode. But learn to use the terminal there as well. – juanpa.arrivillaga Dec 02 '22 at 11:34
  • no worries. I am a physic student and CS student. Juypther is simply much faster to program in if you do not need the full power of an IDE. However, seeing the limitations of juypther makes it an easy reason to change. – derdotte Dec 03 '22 at 12:50

1 Answers1

2

there's a problem with multiprocessing and jupyterlab, so you should use pathos instead.

import multiprocessing as mp
import numpy as np
import scipy.constants as constants
from concurrent.futures import ProcessPoolExecutor
import pathos.multiprocessing as mpathos

Nmin = 0
Nmax = 20
periods = np.linspace(Nmin, Nmax, 2*Nmax +1) # 0.5 steps

results = dict()

def identity(a):
    return a

with mp.Manager() as manager:
    sharedresults = manager.dict()

    with mpathos.Pool() as pool:
        print("pools are active")
        for result in pool.imap(identity, periods): 
            #sharedresults[per] = res
            print(result)

orderedResult = []
for k,v in sorted(results.items()):
    oderedResult.append(v)
Ahmed AEK
  • 8,584
  • 2
  • 7
  • 23
  • could you add "import scipy.constants as constants" at the top? – derdotte Dec 01 '22 at 19:08
  • or, well let me minimize the example – derdotte Dec 01 '22 at 19:09
  • @derdotte pathos should fix this, it has to do with how the multiprocessing module is implemented. – Ahmed AEK Dec 01 '22 at 19:25
  • Yeah, it fixed the minimal problem. I posted the following problem as it raises an exception when i switch to the c_i method. It says NameError: name np is not defined – derdotte Dec 01 '22 at 19:31
  • 1
    "there's a problem with multiprocessing and jupyterlab, so you should use pathos instead." this to me doesn't seem reasonable - the reasonable conclusion would just be "don't use jupyterlab" – juanpa.arrivillaga Dec 01 '22 at 20:10
  • @juanpa.arrivillaga there are some situations where it is more convenient or feasible to use jupyterlab, such as if you are doing SSH into a remote server and need an interactive python execution environment in it .... Basically what google colab does and a few other sites, Though they have fixed the multiprocessing problem. – Ahmed AEK Dec 02 '22 at 09:38
  • 1
    @AhmedAEK that doesn't make much sense to me at all. I do much of my work in the terminal via an SSH session on various remote server. Usually, I'll stick to tmux, vim, and an IPython repl, but nowadays, it is super easy to just use a text editor like VSCode (just as an example) which will have [a turnkey solution](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.vscode-remote-extensionpack) to setting remote development through an ssh connection. – juanpa.arrivillaga Dec 02 '22 at 11:31
  • 1
    Now, I understand that notebooks can be nice for exploratory data analysis, but if you are busting out multiprocessing/pathos, maybe it's time to consider something else. Anyway, I do think this answer could be helpful to other people in the future, and it is good to know about this limitation for jupyter notebooks, I wasn't aware at all! – juanpa.arrivillaga Dec 02 '22 at 11:32
  • Thats the point. I was not aware of the limitations in Jupyther. Yesterday i ran parameter study for 5000 parameters and while it still took long, it was much more bearable than waiting days. I know that parameter studies are nowadays ofte done on the GPU. Sadly no time to read into that aswell. 5000 Parameters took about 3 hours on 32 threads. Thats fine by me. – derdotte Dec 03 '22 at 12:54