Python multiprocessing: joining a child without hanging up the parent -- requesting clarification of an existing answer

Question

This answer suggests how it is possible to join children of a process, without hanging up the parent process itself.

I am having lots of trouble understanding it however, and hope that I can get your help. These are my questions at the moment:

Is Thread an the Thread object from the threading class?
When should the Joiner object be joined?
Is this the still the best way to handle joining a child without hanging up the parent?

Impossible to say unless you clarify what you're trying to accomplish by `join`ing the children. Why join at all if you don't want to block? The linked thread has a very specific purpose for doing so, are you trying to do the same thing? — roippi, May 23 '14 at 21:01
@roippi In my case, what I have is a list of tasks. Now, I am dividing that list up into chunks, and every tasks in a chunk I run in parallel. I would like each task in that chunk to be fully cleaned up before I move on (sequentially) to the running the next chunk of tasks in parallel. I have a main task that implements this idiom, and then its children (and grandchildren) implement this idiom too as they too run sub-tasks in parallel in this "chunky" manner. So, joining a child without hanging the parent becomes important, if I don't want to mess with SIGCHLD, etc. — bzm3r, May 23 '14 at 21:06

score 0 · Answer 1 · answered May 26 '14 at 18:06

Yes, Thread = threading.Thread()
The Joiner is a way to tell each worker thread to go ahead an exit. If you write workers as consuming jobs, processing each one, then continuing, the worker will never exit. The Joiner is one way of flagging a worker to exit.
There are two best ways of getting a worker to exit. Either a) give it a fixed list of things to do, vs sending it a bunch of jobs in queue. Or, b) send each worker a "stop processing" special job flag. For instance if you send numbers to your workers, send it a None to get the worker to stop.

Here's an example of what you want. There's a fixed list of chunks, each of which has a list of tasks. Start 3 workers, divide the single chunk into 3 lists of tasks, and give each fixed-sized list to each worker. Wait for workers to finish their list, then continue.

For the following code we're counting vowels in words in a dictionary. Each 1000 words in the dictionary is a "chunk". Each worker processes about 300 words, runs in parallel, and exits after its list of words is done. After the first chunk of 1000 words is done, the program loops to handle another chunk of 1000 words.

Thus, each chunk is run sequentially, but each chunk is handled in 3 parallel pieces.

Have fun!

# break list of tasks into chunks.  All tasks for a specific chunk run
# in parallel, but chunks are run sequentially.

import re
from threading import *

# dictionary with ~100K words, each on separate line
wordsf = open('/usr/share/dict/words')

def count_t(words):
    num_vowels = 0
    consonant_pat = re.compile('[^aeiou]', re.IGNORECASE)
    for word in words:
        # zap consonants, what's left is vowels
        num_vowels += len( consonant_pat.sub('', word) )
    print '\t{} vowels'.format(num_vowels)

while True:
    # read 1000 words into a chunk
    chunk = [ wordsf.readline() for _ in xrange(1000) ]
    # exit when first word is empty (at end of file)
    if not chunk or not chunk[0]:
        break

    print 'CHUNK {} to {}'.format(
        chunk[0].rstrip(), chunk[-1].rstrip(),
        )

    # create 3 worker threads. Each gets a separate part of the chunk.
    workers = [
        Thread(target=count_t, args=(chunk[:300],)),
        Thread(target=count_t, args=(chunk[300:600],)),
        Thread(target=count_t, args=(chunk[600:],)),
    ]

    # run this chunk's workers in parallel
    for worker in workers:
        worker.start()

    # wait for them to complete
    for worker in workers:
        worker.join()

    # continue to next chunk

Example output:

CHUNK A to Ashikaga
    897 vowels
    956 vowels
    1338 vowels
CHUNK Ashikaga's to Boone's
    909 vowels
    841 vowels
    996 vowels

Here are the last three chunks being processed:

CHUNK winter's to yum
    798 vowels
    734 vowels
    825 vowels
CHUNK yummier to 
    364 vowels
    0 vowels
    0 vowels

Python multiprocessing: joining a child without hanging up the parent -- requesting clarification of an existing answer

1 Answers1