6

I've successfully implemented a multiprocessed script on Windows, but the same script launches a "RuntimeError: already started" on linux and stops the execution. The script consists of the following "main.py" (omitted some part for readability):

from multiprocessing import freeze_support

if __name__ == '__main__':
    #MULTIPROCESSING STUFF
    freeze_support()

    #DO SOME STUFF

    #Call my multiprocessing-function in other module
    mod2.func(tileTS[0], label, areaconst)

And the "mod2.py" module:

import numpy as np
from multiprocessing import Pool
from functools import partial
import os, time

def func(ts, label, areaconst):
    #SETTING UP/LOADING SOME VARIABLES

    for idx in totImgs:            
        img_ = myList[idx]      

        p = Pool(2)
        result = p.map(  partial(_mp_avg, var1=var1_, img=img_), range(totObjs) ) 

        p.close()
        p.join()

        #MANAGE RESULTING VARIABLES

    return None


def _mp_avg(idx, img, var1):
    num = idx + 1
    arr = img[var1==num]
    if np.isnan(arr).any():
        return np.nan 
    else:
        return np.sum( arr )  

This error is launched when the script executes the "Pool.map" function/class (dunno tbh). The same code works flawlessly on Windows.

I'm using Ubuntu 18.04 and launching the python 3.6.7 script from Visual Studio Code.

EDIT: added screenshot of runtime error(s) Terminal Error message

C. Daniel
  • 149
  • 1
  • 6
  • Smells like an IDE thing, have you tried running it from terminal? – Darkonaut Apr 10 '19 at 15:28
  • I cannot run your code as it is missing variable declarations so I cannot debug it. However I noticed you are using `partial` inside of `map`, the correct way would be using [starmap](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.Pool.starmap) as you can see in [this question](https://stackoverflow.com/questions/5442910/python-multiprocessing-pool-map-for-multiple-arguments). If `partial` generates side-effects this way then Windows won't be affected by it because of how it creates new processes – lucasgcb Apr 10 '19 at 15:29
  • @lucasgcb unfortunately the dataset is quite massive, that's why I didn't bother sharing it. I tried your suggested solution but it still generates the same error (see "EDIT"). – C. Daniel Apr 10 '19 at 16:24
  • @Darkonaut running from terminal does make the script run, but I'd still like to make it also work on my IDE (for easy debugging purposes). Note that the script does run on windows while using Visual Studio Code. – C. Daniel Apr 10 '19 at 16:26
  • Well at least you know now it's due to visual-studio-code not working and it's not a Python problem. VS code is a Microsoft product, so it's not a surprise that it works with windows. Maybe the `freeze_support()` is confusing VS code on Linux, I would try without that line. – Darkonaut Apr 10 '19 at 16:35
  • @C.Daniel you do not need to process the entire dataset to test your function. A few mock units would be enough to test the behavior. – lucasgcb Apr 10 '19 at 17:01
  • @lucasgcb I'm dealing with satellite data: one image is still almost 1GB. That said, if you wanna try it out, "img" is simply and image, while "var1" is a mask, labelling different parts of the image. The code simply runs through the total number of labels ("totObjs"). For each it selects only the pixels belonging to it and computes the sum of these. – C. Daniel Apr 11 '19 at 06:42
  • @Darkonaut I tried removing it but it doesn't fix the issue. – C. Daniel Apr 11 '19 at 06:43
  • @C.Daniel Sorry Daniel, but I'm not going to hack through your code and compose a mock dataset for you. You're more likely to get help if you offer a reasonable way to replicate the issue - This is not only helpful for us, it's also helpful for your own testing endeavors; you shouldn't need a huge picture to unit test this functionality and we shouldn't need to invest to begin observing its behavior. – lucasgcb Apr 11 '19 at 07:23
  • 2
    It's because the debugger (ptvsd) is not fork-save ([github issue](https://github.com/Microsoft/ptvsd/issues/1046#issuecomment-443339930)). You currently only have the option to change the start-method to "spawn" (`multiprocessing.set_start_method("spawn")`), or you switch your IDE. – Darkonaut Apr 11 '19 at 15:18

1 Answers1

8

As pointed out by @Darkonaut, Visual Studio Code uses ptvsd as debugger, which isn't fork-save (https://github.com/Microsoft/ptvsd/issues/1046#issuecomment-443339930). Since on linux the default process spawn method is "os.fork()", the script will generate a RuntimeError if executed from within VSCode. This will not happen on Windows. Solutions on Linux are:

  • Change start-method by inserting once the following line after the main function call:

    multiprocessing.set_start_method("spawn")
    
  • Edit code with VSCode and launch from Terminal.

  • Change IDE.

  • Wait for fork-save debugger update, which is supposedly under work.

Check the following link for further information about the problem: https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods

C. Daniel
  • 149
  • 1
  • 6