0

EDIT

The proposed code actually worked! I was simply running it from within an IDE that wasn't showing the outputs.

I'm leaving the question up because the comments/answers are instructive


I need to split a big job across many workers. In trying to figure out how to do this, I used the following simple example, with code mostly taken from here. Basically, I am taking a list, breaking it up in shorter sublists (chunks), and asking multiprocessing to print the content of each sublist with a dedicated worker:

import multiprocessing
from math import ceil

# Breaking up the long list in chunks:
def chunks(l, n):
    return [l[i:i+n] for i in range(0, len(l), n)]

# Some simple function 
  def do_job(job_id, data_slice):
      for item in data_slice:
          print("{}_{}".format(job_id, item))

I then do this:

if __name__ == '__main__':

    # My "long" list
    l = [letter for letter in 'abcdefghijklmnopqrstuvwxyz']

    my_chunks = chunks(l, ceil(len(l)/4))

At this point, my_chunks is as expected:

[['a', 'b', 'c', 'd', 'e', 'f', 'g'],
 ['h', 'i', 'j', 'k', 'l', 'm', 'n'],
 ['o', 'p', 'q', 'r', 's', 't', 'u'],
 ['v', 'w', 'x', 'y', 'z']]

Then:

    jobs = []
    for i, s in enumerate(my_chunks):
        j = mp.Process(target=do_job, args=(i, s))
        jobs.append(j)
    for j in jobs:
        print('starting job {}'.format(str(j)))        
        j.start()

Initially, I wrote the question because I was not getting the expected printouts from the do_jobfunction.

Turns out the code works just fine when run from command line.

MPa
  • 1,086
  • 1
  • 10
  • 25
  • `print("{}_{}".format(job_id, item) ` is missing a closing `)` in your above code. It's quite possible that the processes are hitting this as an unhandled exception that isn't being displayed. – roganjosh Oct 09 '16 at 14:50
  • Thanks @roganjosh, but that was not it (transcription error, not error in the code on my computer). Edited the post to add the missing `)`. – MPa Oct 09 '16 at 14:57
  • 1
    Hmm ok. Well in that case, I cannot replicate this issue. The printed output is garbled due to all the child processes being started and run at slightly different times. However, the output _is_ there in my case. – roganjosh Oct 09 '16 at 15:04
  • I think Simon might have the solution to your problem. You may or may not be interested in a generic answer I wrote [here](http://stackoverflow.com/questions/39750873/python-multi-threading-in-a-recordset/39753853#39753853) that splits something more calculation-heavy across multiple processes and aggregates the results . – roganjosh Oct 09 '16 at 15:16
  • Well, this is embarrassing: turns out everything was ok all along, but I was trying to run the code from within a console in Spyder. As soon as I used the command line instead, as indicated in the link I had provided (!), everything worked fine. I did however learn a lot, and thank you for your generic answer in the link you provided. I will delete this question later. – MPa Oct 10 '16 at 00:56
  • Please do _not_ delete this question! At the very least, it will damage your account and doing it several times will ban you from asking. People have taken time to answer and there is still info in here, it is unfair to wipe our contributions. – roganjosh Oct 10 '16 at 01:15
  • @roganjosh indeed ok. I'm a relative newbie, I didn't know what the etiquette called for in this situation, but I'll gladly leave this up. I modified the question to show it was misguided, and thank you again for your help! – MPa Oct 10 '16 at 01:33
  • No worries, glad you got it sorted :) in the end there was an issue and you solved it. In future, leave the question as it is and if you also solve the problem at a later time, post that as an answer too. I guessed you didn't know the system and didn't want you to find yourself with problems later; it doesn't take too many such things for the algorithm to flag your account if you're new. – roganjosh Oct 10 '16 at 01:38

1 Answers1

2

Maybe it's your first time with multiprocessing? Do you wait for the processes to exit or do you exit the main processes before your processes have time to complete there job?

from multiprocessing import Process
from string import ascii_letters
from time import sleep


def job(chunk):
    done = chunk[::-1]
    print(done)

def chunk(data, parts):
    divided = [None]*parts
    n = len(data) // parts
    for i in range(parts):
        divided[i] = data[i*n:n*(i+1)]
    if len(data) % 2 != 0:
        divided[-1] += [data[-1]]
    return divided


def main():
    data = list(ascii_letters)
    workers = 4
    data_chunks = chunk(data, workers)
    ps = []
    for i in range(4):
        w = Process(target=job, args=(data_chunks[i],))
        w.deamon = True
        w.start()
        ps += [w]
    sleep(2)



if __name__ == '__main__':
    main()
Simon
  • 424
  • 4
  • 12
  • If you don't intend to allow your child processes to have own children, make them daemons and your main process will try and terminate its daemon children. – Simon Oct 09 '16 at 15:09
  • Interesting, I hadn't noticed `daemon` flag before. I'm not getting a clear picture from documentation on the difference between this and `join()` – roganjosh Oct 09 '16 at 15:12
  • 1
    Read up on it, it's one of those lovely python features that simplify life! – Simon Oct 09 '16 at 15:15
  • Would you mind qualifying my understanding from what I'm reading? `join()` suspends the parent process until all children finish working, then proceeds. `daemon` simply means that the parent process cannot exit without the children being complete, but that doesn't mean the parent cannot continue to do something else entirely in the meantime? – roganjosh Oct 09 '16 at 15:22
  • I've only written to big programs that required processes. That being said. process.deamon = True. Sets the deamon flag in the child-processes, and the parent upon exiting will natively will "try" to terminate the child processes, this is not a grantee and you might end up with "zombie"-processes. The processes.join() is call on a child and is blocking until the child processes self terminates, so you don't end up with zombies. But I don't think I've ever written any code in which required me to wait the self termination, and I simply call, process.terminate() and then process.join(). – Simon Oct 09 '16 at 15:45
  • @Simon That worked...but it turns out, embarrassingly, that the code I had posted also worked. I accepted the answer because I learned much from it (esp. on deamons and via the comments). – MPa Oct 10 '16 at 00:57
  • @Simon Note that the chunk function you proposed will only work as expected if the number of elements in the original list is a multiple of the number of chunks. – MPa Oct 10 '16 at 01:34
  • Glad to help. I should mention that a deamon processes can't have own children. So if you have like a rest server, and have committed yourself to have process that essentially is a listener that only receives messages, and sends the messeges to a child processes that writes to a database. The listiner can't be a deamon, python will raise Exeption, and the listiner will require some good exit strategy. – Simon Oct 11 '16 at 08:22