-2

I must to learn OpenMP sources on gcc. I have read documentations of OpenMP (3.0 and 4.0). As I know, OpenMP used work-sharing mechanism. As I understand work-sharing mechanism transmits tasks between threads while threads are running. Or does distribution of data between threads is executing before executing these threads?

  • Learning to [implement the work-sharing yourself can teach you a lot](http://stackoverflow.com/a/30591616/2542702). – Z boson Apr 13 '16 at 06:18

2 Answers2

2

If you are using OpenMP with tasks, the tasks are stored in one or more task queues. If a thread finds itself idle, it will snoop tasks from a neighboring queue. This is internal to libgomp.

If you use OpenMP parallel for with a static schedule, no task snooping will take place.

If you use OpenMP parallel for with a dynamic schedule, threads in team will divide the work dynamically, so idle threads will take tasks from the rest of the team.

In general, when threads need to communicate at run-time, cycles are spent away from processing.

Klaas van Gend
  • 1,105
  • 8
  • 22
2

Complementing @klaas-van-gend answer: in order for a libgomp thread to start stealing tasks it needs to be idle AND not be in any active taskwait construct (explicit or implicit).

For example, think of a binary tree representing a task graph. If the thread that created the root node is not fast enough to start running one of its two children, it will be idle until the execution of its child tasks is finished.

This behavior is observed in GCC 9.1.

If we run this code with libgomp we can observe the behavior thanks to graphviz graph generated. Colors and numbers inside parenthesis represent a core/thread. The number outsid parenthesis indicates the computational weight of task, and the number on edges is the time when task started to run. libgomp task tree As we can see, the core 1 (blue) stayed idle until the end of its taskwait construct. Core 0 (white) only stealed task 6 after the end of the taskwait created by task 4. Same for core 3 (green) and task 12.

However, if we run this code with Clang/LLVM and libomp implementation, we have a fully work-stealing algorithm. No core is idle at any time. This behavior is observed on Clang 8 :) libomp task tree

648trindade
  • 689
  • 2
  • 5
  • 21