0

I have an application in Tkinter.

Part of this application is a method: It basically takes long lists of random values and checks if the random values are inside of a previously defined grid. Afterwards it writes them into another variable to export it.

This is a rather long process. So I would like to multiprocess it.

Read some stuff about how to do that. Here's the resulting code:

I've read around SO for stuff that might be relevant. I am running an up-to-date Spyder with Python 3.7 as part of the Anaconda-suite on both machines, all (at least included) packages are up-to-date and I've included the

    if __name__ == '__main__':

-line. I've also experimented with indentation of

    p.start()

and

    processes.append(p)

Simply can't get it to work.

def ParallelStuff(myIn1, myIn2, myIn3, myIn4, anotherIn1, anotherIn2,  anotherIn3, return_dict, processIterator):

    tempOut1 = np.zeros(len(myIn1)) # myIn1, myIn2, myIn3  are of the same length

    tempOut2 = np.zeros(len(myIn1))

    tempOut3 = np.zeros(len(myIn1))
    bb = 0

    for i in range(len(myIn3)):
        xx = myIn3[i]
        yy = myIn4[i]

        hits = np.isin(anotherIn1, xx)
        goodY = anotherIn3[np.where(hits==1)]

        if np.isin(yy, goodY):

            tempOut1[bb] = myIn1[i]
            tempOut2[bb] = myIn2[i]
            tempOut3[bb] = anotherIn3
            bb += 1

    return_dict[processIterator] = [tempOut1, tempOut1, tempOut3]


nCores = multiprocessing.cpu_count()

def export_Function(self):
    out1 = np.array([])
    out2 = np.array([])
    out3 = np.array([])


    for loop_one in range(0, N):

        # ...
        # stuff that works on both systems with only one core...
        # ... and on linux with all cores
        processes = []
        nTotal = int(len(xRand))
        if nTotal%nCores == 0:
            o = int(nTotal/nCores)
        else:
            o = int(nTotal/(nCores-1))

        manager = multiprocessing.Manager()
        return_dict = manager.dict()

        for processIterator in range (nCores):
            offset = o*i

            myIn1 = in1[offset : min(nTotal, offset + o)]
            myIn2 = in2[offset : min(nTotal, offset + o)]
            myIn3 = in3[offset : min(nTotal, offset + o)]
            myIn4 = in4[offset : min(nTotal, offset + o)]

            if __name__ == '__main__':
                p = multiprocessing.Process(target = ParallelStuff, args = (myIn1, myIn2, myIn3, myIn4, anotherIn1, anotherIn2, anotherIn3, return_dict, processIterator))
            p.start()
            processes.append(p)

        for p in range(len(processes)):
            processes[p].join()

            myOut1 = return_dict[p][0]
            myOut2 = return_dict[p][1]
            myOut3 = return_dict[p][2]

            out1 = np.concatenate((out1, myOut1[np.where(myOut1 != 0)]))
            out2 = np.concatenate((out2, myOut2[np.where(myOut2 != 0)]))
            out3 = np.concatenate((out3, myOut3[np.where(myOut3 != 0)]))

When I run my programm on my Linux machine it does exactly what it's supposed to do. Distribute to all 8 cores, computes, concatenates the 3 results in the respective arrays, exports.

When I run my programm on my Windows machine the application's window freezes, the process becomes inactive, a new kernel automatically opens and a new window appears.

Darkonaut
  • 20,186
  • 7
  • 54
  • 65
Stefan
  • 13
  • 1
  • 6
  • 1
    Using `if __name__ == '__main__'` inside a `loop` is not doing what `multiprocessing` requires. Read [what-does-if-name-main-do](https://stackoverflow.com/questions/419163/what-does-if-name-main-do) – stovfl Feb 05 '19 at 11:37
  • There seems to be an issu with python 3.7 multiprocessing on Windows. See https://stackoverflow.com/questions/54506728/problem-using-python-multiprocessing-pool-map-becomes-intractable-in-python-3 - can you try downgrade – deets Feb 05 '19 at 11:40
  • @deets That's not it. I've tried with 3.6 first. Upgraded only to 3.7 after it wouldn't work. Since I've been using 3.7 on my ubuntu machine (installed it more recently, so it was 3.7 by default) I thought that might be it. Will glagly read the link though – Stefan Feb 05 '19 at 12:33
  • Try getting simpler examples to work. The stuff you do with the embedded `if __main__` looks at least weird. You should also clearly separate setting up your worker pool from then setting up data and utilizing it. Instead of rolling everything into each other, potentially causing interrelated bugs. – deets Feb 05 '19 at 14:01
  • @stovfl I've seen you've removed the tkinter-tag since my code doesn't involve tkinter. Indeed it doesn't. But couldn't the problem stem from tkinter? Because reading the link you've provided I've stumbled upon _**Only When Your Module Is the Main Program**_ - which wouldn't be the case with tkinter, would it? In any way - any pointers how to correctly place the guard-segment? – Stefan Feb 05 '19 at 14:02
  • @deets I have sucessfully managed to run other, simpler examples on multiple cores. The `if __main__` statement was added afterwards since it was not required to run on ubuntu (I do realize it might not be best practice) as a desperate attempt to run it on windows. In terms of seperating - I don't really need to multiprocess anything else. Only this particular routine/method of a bigger program with a LOT of methods. Changing the structure now would involve a ridiculous amount of work. I didn't expect this thing to get blown out of proportion that much. – Stefan Feb 05 '19 at 14:07
  • Moving the pool out of your for-loop is not a ridiculous amount of work. In general, multiprocessing works best if the workers are spawned as early as possible in running the program. And there is absoultely no harm in letting them sit and wait until they are being used by this one functionality. So put the pool creation at the main of your actual program. Then use it inside this function here. – deets Feb 05 '19 at 14:23
  • @Stefan: *"But couldn't the problem stem from tkinter?"*: In general, didn't see any problem to run `tkinter` using `multiprocessing`. Windows behave different, read [runtimeerror-on-windows-trying-python-multiprocessing](https://stackoverflow.com/questions/18204782/runtimeerror-on-windows-trying-python-multiprocessing), The usage of `if ... __main__` see [module-multiprocessing](https://docs.python.org/3/library/multiprocessing.html#module-multiprocessing) – stovfl Feb 05 '19 at 14:27
  • Your manager creation is not protected from recreation. Read [here](https://stackoverflow.com/q/52693216/9059420). – Darkonaut Feb 05 '19 at 15:56
  • @deets From the way I understand the workflow of python's multiprocessing (which might very well be wrong) I would have to put `p = multiprocessing.Process(target=...)` , as well as `p.start()` in the main file upon initializing my application - correct? – Stefan Feb 07 '19 at 10:46
  • Yes. Do this ASAP. – deets Feb 07 '19 at 10:50
  • To elaborate on the ASAP (a bit concise): set up as early as possible, even before spinning up the GUI. Not "go work on this as soon as possible" ;) – deets Feb 07 '19 at 11:28
  • @deets YESSIR! ... oh. Okay... ;) However I can't really define `p = multiprocessing.Process(...)` as I won't have the target's input arguments by then. Most of them are members of the application class that gets initialized with the window and are supposed to be changed or even computed during this particular routine being a a member of the GUI's subclass. So I would have to pass a crapload of arguments through all instances, woudn't I? Or can I somehow circumvent the need for defining a target for `p` while initializing? – Stefan Feb 07 '19 at 12:25
  • You might need different abstractions. Spawn the workers, but let them sit on a multiprocessing queue. This queue is what needs passing around. And then push work packages down it's throat and gather the results. – deets Feb 07 '19 at 13:00
  • @deets I see. So I'd have to take a "completely" different approach. – Stefan Feb 07 '19 at 14:06
  • I would t call it that way. Because it’s just wrapping the calls. – deets Feb 07 '19 at 18:50

0 Answers0