0

There were several question on this topic but I couldn't find answer for my questions. Even python docs isn't that descriptive.

My problem is simple: I want to break up a huge list into pieces and process each piece in parallel.

So my question is whether the interpreter waits till all threads are finished before it starts the downstream lines of the program (in my case- consolidation of the processed list) or do I have to define the downstream process as a separate thread and use join.

Although, I read the post on the topic (Thread vs. Threading) I couldn't still much understand what is the difference between thread and threading.

Please direct me to a good text on the topic. The docs are not very informative.

PS (@zzk) So even if I use multiprocessing, how will I execute a common code after all processes end? For e.g. 5 processes produce 5 lists. And now I have to merge these lists, sort and write to a file.

[the code is not exact and is just for explaining the situation]

def fun(x,y):
    y=someprocessing(x) #type(y)=List

if __name__ == '__main__':
    for i in listofprocesses:
        p = Process(target=fun, args=(i,y))
        p.start()

# DOWNSTREAM CODE#
yy=y1+y2+y3+y4+y5; 
yy.sort()
for j in yy:
    outfile.write(j)

I want to combine y produced from different processes to be merged. There are two doubts here:

  1. since the variable name is the same, do I have to pass the output list (y) as an argument

  2. Assuming so, and all the processed lists are saved as y1,y2,y3,y4& y5, will the downstream code be executed. How to make sure that all the processes have ended?

Community
  • 1
  • 1
WYSIWYG
  • 494
  • 6
  • 23

1 Answers1

1

Threading or thread won't help you help due to the GIL.

In CPython, the global interpreter lock, or GIL, is a mutex that prevents multiple native threads from executing Python bytecodes at once. This lock is necessary mainly because CPython's memory management is not thread-safe.

You may need multiprocessing

zzk
  • 1,347
  • 9
  • 15