Generally, how much is the overhead of an Open MP barrier in terms of clock cycles?
I mean the following:
Suppose all threads have already finished their work at hand at the same time. They all reach the start of the barrier at the same time. How many extra clock cycles does it take to go pass the barrier?
Does synchronizing existing threads on Linux involve calls to the kernel of the OS?
Thanks.
Related:
How is thread synchronization implemented, at the assembly language level?