2

I was looking further into pthread barriers following the pthread tutorial at pthread Tutorial - Peter Chapin, 3.2 Barriers pg 11 going through the use of two barriers in the thread function, the first suspends all threads until all have reached the loop_barrier confirmed by the arbitrary thread elected to do any serial cleanup following the barrier return of PTHREAD_BARRIER_SERIAL_THREAD and a subsequent barrier in the thread function of prep_barrier which ensures all threads are suspended until the serial cleanup is done.

My understanding being that this allows threads to run continually while providing thread synchronization at a given pointer in the processing where all work on a per-cycle basis is completed before all threads continue running in a concurrent manner. The example simply shows what occurs for one such cycle and then a done flag is set and the thread function returns.

All threads do suspend and wait on the loop_barrier and prep_barrier, but the problem is that following thread function return the program stalls on the first pthread_join() which gdb explaiins, rather unhelpfully, is the result of "in pthread_barrier_destroy () from /lib64/libpthread.so.0"

The tutorial provides only a framework for the thread function and main program and I simply provided the minimum to complete it, declaring a struct to hold the different loop-limits for the for loop in each thread function and members to hold the thread index and sum of the for loop variable values. Apparently I didn't understand the barriers quite a completely as I thought I did. The code causing the hang-on-join is:

#define _GNU_SOURCE

#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <pthread.h>

#define handle_error_en(en, msg) \
  do { errno = en; perror(msg); exit(EXIT_FAILURE); } while (0)

#define handle_error(msg) \
  do { perror(msg); exit(EXIT_FAILURE); } while (0)

#define NCPU 4
#define ITER_PER_CPU  100

typedef struct {
  int index, start, end;
  unsigned sum;
} loop_data;

pthread_barrier_t loop_barrier;   /* global barriers (could pass in data) */
pthread_barrier_t prep_barrier;

void *thread_fn (void *data)
{
  int done = 0, 
      i = 0;
  loop_data *thread_data = data;
  
  do {
    for (i = thread_data->start; i < thread_data->end; i++) {
      /* each arg gets separate loop_data - do work */
      thread_data->sum += i;
    }
    
    /* suspend on barrier and do any per-cycle cleanup */
    if (pthread_barrier_wait (&loop_barrier) == PTHREAD_BARRIER_SERIAL_THREAD) {
      puts ("PTHREAD_BARRIER_SERIAL_THREAD");
      /* no actual per-cycle cleanup, just set done flag */
      done = 1;
    }
    /* suspend on barrier until per-cycle cleanup complete */
    pthread_barrier_wait (&prep_barrier);
    
    printf ("thread index: %d, sum: %d\n", 
            thread_data->index, thread_data->sum);
    
  } while (!done);
  
  return data;
}

int main (void) {

  pthread_t id[NCPU];
  pthread_attr_t attr;
  loop_data arr[NCPU] = {{ .start = 0 }};
  void *res;
  int rtn = 0;
  
  /* initialize barriers and validate */
  if ((rtn = pthread_barrier_init (&loop_barrier, NULL, NCPU))) {
    handle_error_en (rtn, "pthread_barrier_init-loop_barrier");
  }
  if ((rtn = pthread_barrier_init (&prep_barrier, NULL, NCPU))) {
    handle_error_en (rtn, "pthread_barrier_init-prep_barrier");
  }
  
  /* initialize thread attributes (using defaults) and validate */
  if ((rtn = pthread_attr_init (&attr))) {
    handle_error_en (rtn, "pthread_attr_init");
  }
  
  /* set data index, start, end and create/validate each thread */ 
  for (int i = 0; i < NCPU; i++) {
    /* initialize index, start / end values */
    arr[i].index = i;
    arr[i].start = i * ITER_PER_CPU;
    arr[i].end = (i + 1) * ITER_PER_CPU;
    printf ("id: %d, start: %3d, end: %3d\n", i, arr[i].start, arr[i].end);
    /* create thread and validate */
    if ((rtn = pthread_create (&id[i], &attr, thread_fn, &arr[i]))) {
      handle_error_en (rtn, "pthread_create");
    }
  }

  /* join all threads and compare sums from threads with sums in main */
  for (int i = 0; i < NCPU; i++) {
    loop_data *data = NULL;
    /* join and validate */
    printf ("joining thread index: %d\n", i);
    if ((rtn = pthread_join (id[i], &res))) {
      fprintf (stderr, "error: thread %d\n", i);
      handle_error_en (rtn, "pthread_join");
    }
    data = res;   /* pointer to return struct provided through parameter */
    printf ("thread index: %d joined\n", data->index);
  }
  
  /* destroy barriers and validate */
  if ((rtn = pthread_barrier_destroy (&loop_barrier))) {
    handle_error_en (rtn, "pthread_barrier_destroy-loop_barrier");
  }
  if ((rtn = pthread_barrier_destroy (&prep_barrier))) {
    handle_error_en (rtn, "pthread_barrier_destroy-prep_barrier");
  }
}

Example Use/Output

$ ./bin/pthread-vtctut-04
id: 0, start:   0, end: 100
id: 1, start: 100, end: 200
id: 2, start: 200, end: 300
id: 3, start: 300, end: 400
joining thread index: 0
PTHREAD_BARRIER_SERIAL_THREAD
thread index: 2, sum: 24950
thread index: 0, sum: 4950
thread index: 1, sum: 14950
thread index: 3, sum: 34950
^C

The manual interrupt provided where the code hangs on line 94 at if ((rtn = pthread_join (id[i], &res))) {. So why since each thread function is released by the second barrier (as indicated by the "thread index: x, sum: yyyy" output does the code hang on pthread_join() in main()?

David C. Rankin
  • 81,885
  • 6
  • 58
  • 85
  • Tip: You could greatly simplify the code by calling `handle_error_en` unconditionally, but having it return if `en == 0`. – ikegami Mar 07 '23 at 02:28
  • If I were going to do this with a barrier, then I would probably use only one barrier (your second) instead of two. In place of the first barrier, I would just use a shared counter, either atomic or mutex-protected. – John Bollinger Mar 07 '23 at 04:15
  • @ikegami - yes, that would be quite an improvement eliminating all those repeated `if` blocks on validation. – David C. Rankin Mar 07 '23 at 06:37
  • 1
    @ikegami easiest rewrite of `handle_error_en` to accomplish the change was simply having it ignore `en == 0`, e.g. `do { if (en) { errno = en; perror(msg); exit(EXIT_FAILURE); }} while (0)` – David C. Rankin Mar 07 '23 at 06:50

2 Answers2

2

According to the man page for pthread_barrier_wait, after the necessary number of threads have successfully reached the barrier, the threads will pass the barrier and the barrier will be returned to the same state it had after the most recent call to pthread_barrier_init. This means that when you use the same barrier again, it will again be 4 threads that must reach the barrier.

However, only 3 threads will reach the second iteration of the loop, because the thread that gets PTHREAD_BARRIER_SERIAL_THREAD will exit the loop and terminate.

This means that 3 threads will get stuck on pthread_barrier_wait (&loop_barrier) in the second loop iteration, because the terminated thread will never reach that barrier a second time.

For this reason, the function main will only be able to join with 1 of the 4 threads.

Andreas Wenzel
  • 22,760
  • 4
  • 24
  • 39
2

The primary issue is that the thread that receives PTHREAD_BARRIER_SERIAL_THREAD from the first barrier wait does not effectively signal the other threads to stop. It sets variable done, but this is a local variable of the thread function. Each execution of that function has its own, so one thread modifying its done is not visible to the the other threads via their dones.

Since only one of the threads stops, main will block on either the first or the second pthread_join() call, depending on which thread it happens to be that terminates. If it's the first pthread_join() call that hangs then one thread will be left terminated but unjoined. Either way, three threads will have cycled back around to the first barrier and blocked there.

Fix it by giving done static instead of automatic storage duration, so that it is shared among the threads. You can do that either by declaring it with the static keyword inside the function or by pulling it out of the function to file scope. If you want to be able to set or reset done from outside the scope of the threads running thread_fn() then the latter is the only viable alternative.

Also, the barriers are adequate synchronization primitives for the purpose of ensuring that the threads reaching them all see each others' modifications to a static-duration done, but if you want the threads to see modifications by a thread that does not reach the barrier then you'll need a different mechanism. You could additionally protect it with a mutex, but depending on exactly what you want to do, it might be easier to just make it _Atomic.

John Bollinger
  • 160,171
  • 8
  • 81
  • 157
  • In addition to `static`, eoes `done` need `volatile` (sufficient) or does it need `stdatomic.h` access (e.g. `atomic_load/atomic_store`)? – Craig Estey Mar 07 '23 at 04:12
  • 1
    @CraigEstey `volatile` has no role to play in synchronization. It is neither necessary nor sufficient. Other than that, read the last paragraph (possibly added since your initial read). The barriers are sufficient for making changes to a static-duration `done` by one of the threads passing through them visible to the other threads passing through them. – John Bollinger Mar 07 '23 at 04:18
  • @CraigEstey: The keyword `volatile` is not useful for thread synchronization in ISO C. However, as an extension on the Microsoft compiler, this keyword does provide additional guarantees on the x64 platform, so that it can be used for thread synchronization on that platform. See [this page](https://learn.microsoft.com/en-us/cpp/cpp/volatile-cpp?view=msvc-170) for further information. As far as I know, the Microsoft compiler is the only compiler that does this, and even Microsoft recommends to disable this for performance reasons. – Andreas Wenzel Mar 07 '23 at 04:26
  • Last paragraph... Not an issue because of the double barriers. `done` is only changed when the other threads are waiting for the second barrier or about to. – ikegami Mar 07 '23 at 04:28
  • 1
    Yes, @ikegami, I think that is part of what the last paragraph says. That paragraph also discusses what would be needed if `done` were subject to change by a thread that does not wait at the barrier. That seemed reasonable given that one of the suggested fixes was to lift `done` to file scope, and that it might be desirable to enable termination by a thread other than those running `thread_fn()`. – John Bollinger Mar 07 '23 at 04:35
  • 1
    @CraigEstey: You may find this question useful for further information about `volatile` and thread synchronization: [Why is volatile not considered useful in multithreaded C or C++ programming?](https://stackoverflow.com/q/2484980/12149471) – Andreas Wenzel Mar 07 '23 at 04:42
  • 1
    Thank you John, Andreas, all... That was the defect in my understanding. Makes you feel dumb when you put all the pieces together and, of course, `done` being local to one thread isn't sufficient. But, human frailties being what they are, concentrating on how the two barriers interacted with each other -- made overlooking that point almost inevitable (I wonder if that missing part of the thread function skeleton was intentional -- for just this affect) Either way, effective learning has occurred. – David C. Rankin Mar 07 '23 at 06:31
  • @DavidC.Rankin: In your defense, the `.pdf` file which you used as a reference was buggy in this respect. – Andreas Wenzel Mar 07 '23 at 16:04