1

Context: I've got a simple python script that writes a bunch of files to s3 (~70-100) every few seconds. Because its I/O bound, I wanted to thread the write process so the script performs better. I'm using threading to build my threads.

Question: Since my threads are a) non-daemons and b) they only have 1 task to perform (write a file), if I loop over my list of threads and call .join() will they finish their task and exit gracefully? Do I even need to call join() here or will they just exit when they're done? I believe join() is the way to go here but since i'm very new to python, I don't know what I don't know....

Here's some simplified code for reference:

buildOutput() #calls the section below 
for thread in threads:
   thread.join()
time.sleep(60)

calls:

  for item in out[fileRoot]: 
        #write individiual files
        key = findKey(item, FILE_KEY)
        full_key = FILE_PATH + str(key) + FILE_TYPE
        t = FileWriter(item, full_key)
        t.start()
        threads.append(t) #global threads list for the script

where FileWriter is the class that does the writing.

Brad
  • 6,106
  • 4
  • 31
  • 43

1 Answers1

1

Join makes sure that the main thread will wait until joined thread finishes its task. There is a good ascii art given here.

So you'd better use join when your child threads are performing I/O to prevent any unexpected behavior.

Community
  • 1
  • 1
sgun
  • 899
  • 6
  • 12
  • right but what happens to the thread when its done with its task? are there any mechanisms by which I can be sure that nothing is floating around like a zombie process or garbage collection concerns? something like that? – Brad Jul 22 '13 at 17:17
  • treated like any other method, deleted from stack. Since we are talking about threads there won't be anything like a zombie process because as you said they finish their task. However, without join, if the main thread exits while the other haven't finished, their stack will remain (not good). – sgun Jul 22 '13 at 17:27