2

A simple OpenMP program:

#include <omp.h>
#include <iostream>

int main() {
  int var = 0;
  int var2 = 0;
  #pragma omp parallel private(var) shared(var2)
  {
    var = 1;
    int tid = omp_get_thread_num();
    printf("Inside the parallel region, var gets modified by thread #%d to %d\n",tid,var);
    if(tid == 0)
      var2 = var;
  }
  printf("Outside the parallel region, var  = %d\n", var);
  printf("Outside the parallel region, var2 = %d\n", var2);
}

Result:

Inside the parallel region, var gets modified by thread #3 to 1
Inside the parallel region, var gets modified by thread #0 to 1
Inside the parallel region, var gets modified by thread #6 to 1
Inside the parallel region, var gets modified by thread #1 to 1
Inside the parallel region, var gets modified by thread #5 to 1
Inside the parallel region, var gets modified by thread #7 to 1
Inside the parallel region, var gets modified by thread #2 to 1
Inside the parallel region, var gets modified by thread #4 to 1
Outside the parallel region, var  = 0
Outside the parallel region, var2 = 1

What I want to do is to set the value of var to the last modified value inside the parallel region.

Since it's not a #pragma omp for loop, lastprivate is invalid to use.

Outside the parallel region, var gets its original value, 0. A trick is to use a shared variable var2 to stored the modified value from the master thread.

But this increases the overhead and doesn't seem to be an elegant approach, and in case I want to get the value modified by the last thread, not the master, (e.g. to find out which thread finishes last), then this trick will not work.

I'm quite new to OpenMP, so I might be missing something. In case I'm not, is there any way to get over this tricky thing?

Thank you very much.

Edit: My question is about how to remain the last value of a private variable after the parallel region finishes. Or if you could explain why lastprivate is conceptually not valid to use in a #pragma omp parallel, I'll take that as a perfect answer.

Max
  • 3,824
  • 8
  • 41
  • 62
  • remove "if(tid==0)" and you should get what you want (result from last running thread). – nat chouf Apr 19 '13 at 13:53
  • @natchouf removing that statement will cause all the thread executing that instruction, which is an OK trick to know which thread did the instruction last, but my question is essentially: is there any mechanism similar to `lastprivate` that is legal to use inside `#pragma omp parallel`? So I wouldn't need to do any extra work to remain the value of `var`. – Max Apr 20 '13 at 05:03
  • I don't know any other way. – nat chouf Apr 21 '13 at 19:51

3 Answers3

0

To find out which thread finished last, make each thread write its finishing time to an array. The array should have a size of at least omp_get_max_threads(). Index it using omp_get_thread_num() inside the parallel region.

Once the code leaves the parallel region, find the maximum value in the array.

Ideally, the array should be aligned and padded so that each element is in a separate cache line, so that the threads don't have to pass around a shared cache line when writing their finishing times.

If the parallel regions are at the outer level of the program, there is another more subtle way to do this by exploiting the fact that thread-private variables retain their value between top-level parallel regions. Below is an example of how to exploit this trick.

#include <omp.h>
#include <stdio.h>
#include <unistd.h>

double tmp;
#pragma omp threadprivate(tmp)

int main() {
    double start = omp_get_wtime();
#pragma omp parallel
    {
        sleep(1);
        tmp = omp_get_wtime();
    }
    double finish=start;
#pragma omp parallel reduction(max:finish)
    {
        if( tmp>finish ) finish = tmp;
    }
    printf("last thread took time = %g\n",finish-start);
}
Arch D. Robison
  • 3,829
  • 2
  • 16
  • 26
  • Thanks, but I didn't actually ask about finding out which thread finishes last, that's an example only. I would like to remain the last value assigned to a private variable. This is tricky because private variables will be overwritten by their original value as soon as the parallel section finishes, and the modifier `lastprivate`, which does exactly what I want, is not valid in `#pragma omp parallel` – Max Apr 20 '13 at 04:59
  • How are you defining "last" for a parallel region? – Arch D. Robison Apr 22 '13 at 14:22
  • @arch-d-robinson Whichever thread that finishes the last. But again I'm not asking for this particular problem, it's an example only. – Max Apr 22 '13 at 20:30
  • "lastprivate" defines last as last in the equivalent serial execution. But parallel regions themselves don't have equivalent serial executions. Perhaps if you write the equivalent serial code, the meaning will be clear to us. – Arch D. Robison Apr 23 '13 at 14:45
0

I think there may be a misconception about the lastprivate clause. The OpenMP standard (4.0, section 2.14.3.5) says:

[...] when a lastprivate clause appears on the directive that identifies a worksharing construct [...], the value of each new list item from the sequentially last iteration of the associated loops, or the lexically last section construct, is assigned to the original list item.

Here, the term "list items" refers to the variables that you pass in the lastprivate clause. So in every case the value assigned to the original variable (which was declared as lastprivate) is something that will be more or less determined. In the case of loops, the value assigned to your variable would be the same as assigned in the last iteration of the respective sequential loop. In the case of sections, it is the last assignment to your variable in the last section. This is also what you would expect from a serial program, so I think it's easy to see that there's no point in changing these semantics.

On the other hand, if lastprivate was permitted to be used in something different than a worksharing (or SIMD) construct, as in your example, you would break these semantics. Because you cannot know in advance which thread will finish first, so you would end up with something most probably changing in every execution (call it non-deterministic or even undefined, if you want). In contrast to the behavior stated in the last paragraph, that's most probably not what you would expect from a serial program. I hope this answers your question about the lack of lastprivate in non-worksharing constructs.

Regarding your example, I don't think that there is any built-in functionality in OpenMP that implements what you want to achieve. But since I'm also rather new to the topic, I don't want to finalize this.

By the way: You say

Outside the parallel region, var gets its original value, 0.

This may be a result of your OpenMP implementation. But in general the value of the original variable is undefined after a parallel region that has a private clause on this variable. So I would not take this for granted.

I hope this answers your question.

andreee
  • 4,459
  • 22
  • 42
0

Each thread gets its own private variable. The private construct is just causing confusion here. If you want the last private value in a team of threads it only makes sense to return one for each thread. You can do that like this:

int *vala;
int nthreads;
#pragma omp parallel
{
    nthreads = omp_get_num_threads();
    int ithread = omp_get_thread_num();
    #pragma omp single
    vala = new int[nthreads];
    //
    vala[ithread] = ithread;
}
//vala[] = 0,1,2,3,4,5,6,7,8
delete[] vala;

In general, though, it's a bad idea to allocate the memory yourself. You should let each thread allocate memory for its own private variables. The problem is that the code above has false sharing both at the cache-line left (64 bytes) and the page level (4096 bytes). One way to fix this is to not write to vala in a parallel loop and instead only write to it per thread. For example

int *vala;
int nthreads;
#pragma omp parallel
{
    int nthreads = omp_get_num_threads();
    int ithread = omp_get_thread_num();
    #pragma omp single
    vala = new int[nthreads];
    int val = 0;
    #pragma omp for
    for(int i=0; i<n; i++) {
        val = i;    
    }
    vala[ithread] = val;
}

That still has false sharing but the effect is insignificant because it's done per thread and not per iteration.

In using OpenMP in the last year I can't think of once where I needed to know the time of the last thread exiting a parallel section. However, the order has been important. For example when an operation is associative but not commutative (for example a series of matrix multiplications). In that case you can fill the arrays as a function of thread number and rely on the fact that chunks with static scheduling are assigned in order of increasing thread number C++ OpenMP: Split for loop in even chunks static and join data at the end.

Community
  • 1
  • 1
Z boson
  • 32,619
  • 11
  • 123
  • 226