I am writing a python code using mpi4py from which I import MPI. Then, I set up the global communicator MPI.COMM_WORLD and store in the variable comm.
I am running this code with n > 1 threads and at some point they all enter a for loop (all cores have the same number of iterations to go through).
Inside the for loop I have a "comm.reduce(...)" call. This seems to work for a small number of cores but as the problem size increases (with 64 cores, say) I experience that my program "hangs".
So I am wondering if this has to do with the reduce(...) call. I know that this call needs all threads (that is, say we run 2 threads in total. If one thread enters the loop but the other doesn't for whatever reason, the program will hang because the reduce(...) call waits for both threads).
My question is: Is the reduce call a "synchronization" task, i.e., does it work like a "comm.Barrier()" call? And, if possible, in more general, what are the synchronization tasks (if any besides Barrier)?