0

Generally, how much is the overhead of an Open MP barrier in terms of clock cycles?

I mean the following:

Suppose all threads have already finished their work at hand at the same time. They all reach the start of the barrier at the same time. How many extra clock cycles does it take to go pass the barrier?

Does synchronizing existing threads on Linux involve calls to the kernel of the OS?

Thanks.

Related:

How is thread synchronization implemented, at the assembly language level?

https://spcl.inf.ethz.ch/Publications/.pdf/atomic-bench.pdf

R zu
  • 2,034
  • 12
  • 30
  • Why don't you measure it yourself? You can use the [OpenMP Micro Benchmark suite](https://www.epcc.ed.ac.uk/research/computing/performance-characterisation-and-benchmarking/epcc-openmp-micro-benchmark-suite) for that for example. – Gilles Oct 04 '18 at 15:28
  • I just need a very rough idea. I guess synchronization takes at least 20 ns, which is about 200 clock cycles ... So I better ask the cpu perform 10^3 instructions before synchronization. – R zu Oct 04 '18 at 15:31
  • Nope. The micro-benchmark says 0.5 microseconds for a barrier for 8 threads. I should avoid barriers like the plague. Thanks, Gilles. – R zu Oct 04 '18 at 15:40
  • Maybe you could self-answer the question based on your measurements? It would also be important to add some info about architecture and OpenP runtime to the question/answer. Avoiding barriers is a good idea in general for many reasons ;-). – Zulan Oct 05 '18 at 07:53

0 Answers0