1

After I had read that the initial value of reduction variable is set according to the operator used for reduction, I decided that instead of remembering these default values it is better to initialize it explicitly. So I modified the code in question by Totonga as follows

const int num_steps = 100000;
double x, sum, dx = 1./num_steps;

#pragma omp parallel private(x) reduction(+:sum)
{
  sum = 0.;
  #pragma omp for schedule(static)
  for (int i=0; i<num_steps; ++i)
  {
    x = (i+0.5)*dx;
    sum += 4./(1.+x*x);
  }
}

But it turns out that no matter whether I write sum = 0. or sum = 123.456 the code produces the same result (used gcc-4.5.2 compiler). Can somebody, please, explain me why? (with a reference to openmp standard, if possible) Thanks in advance to everybody.

P.S. since some people object initializing reduction variable, I think it makes sense to expand a question a little. The code below works as expected: I initialize reduction variable and obtain result, which DOES depend on MY initial value

int sum;
#pragma omp parallel reduction(+:sum)
{
  sum = 1;
}
printf("Reduction sum = %d\n",sum);

The printed result will be the number of cores, and not 0.

P.P.S I have to update my question again. User Gilles gave an insightful comment: And upon exit of the parallel region, these local values will be reduced using the + operator, and with the initial value of the variable, prior to entering the section.

Well, the following code gives me the result 3.142592653598146, which is badly calculated pi instead of expected 103.141592653598146 (the initial code was giving me excellent value of pi=3.141592653598146)

const int num_steps = 100000;
double x, sum, dx = 1./num_steps;

sum = 100.;
#pragma omp parallel private(x) reduction(+:sum)
{
  #pragma omp for schedule(static)
  for (int i=0; i<num_steps; ++i)
  {
    x = (i+0.5)*dx;
    sum += 4./(1.+x*x);
  }
}
Community
  • 1
  • 1
John Smith
  • 1,027
  • 15
  • 31
  • I think you're looking for clause 2.14.3.6 of the latest OpenMP spec. Each thread gets a private variable called, as far as you need be concerned, `sum`, which is initialised, for `+`, to 0. Then the computation proceeds. Your attempt to initialise `sum` to some other value is futile. – High Performance Mark Oct 06 '15 at 10:25
  • 1
    If you find it too hard to remember what values OpenMP initialises reduction variables to, remember only one *the identity element for the reduction operation*. – High Performance Mark Oct 06 '15 at 10:30
  • I don't understand why you dislike so much my attempt to initialize a variable and at the same time are OK with operations I do on this variable later. This is very normal behavior for a programmer - initialize and then manipulate variables. The reduction clause promises me that when I exit the parallel section, it will collect all private copies into a single one, but I do not understand at which point it is decided that I can change its value. – John Smith Oct 06 '15 at 10:43
  • I added example in the question where I initialize reduction variable and everything works as expected, i.e. no matter what the default init value is. – John Smith Oct 06 '15 at 10:56
  • The result of your 2nd code snippet is undefined, since the "global" value of `sum` hasn't been set before entry in the `parallel` region. Add a `sum=0;` before the `parallel` and it becomes valid. – Gilles Oct 06 '15 at 10:58
  • in the 1st code snippet `sum` was not defined outside the parallel region either - you did not object this. why? – John Smith Oct 06 '15 at 11:07
  • I did! Simply you didn't see it: "The reason is that upon entry into the parallel region, sum hadn't been initialised." – Gilles Oct 06 '15 at 11:42
  • Gilles, I have updated the question with an example, which is not explained by your comment. – John Smith Oct 06 '15 at 12:17
  • 1
    *Well, the following code gives me the result 3.142592653598146, which is badly calculated pi instead of expected 103.141592653598146* I think one of the reasons for heat rising in these exchanges is that some of us think your expectations are incorrect while you appear to hold fast to the opposite view. Fair enough, but we (@Gilles can disagree if he wishes) think that your program is behaving in conformance with the OpenMP specification. – High Performance Mark Oct 06 '15 at 12:38
  • According to Gilles the last code in the question should give 100+pi (I have added initialization `sum=100` outside the parallel section as Gilles suggested). But, as I've said, it gives 3.142592653598146 instead. Do you agree at least here that I followed his advice? – John Smith Oct 07 '15 at 14:44

1 Answers1

4

Why would you want to do that? This is just begging with all your soul for troubles. The reduction clause and the way the local variables used are initialised are defined for a reason, and the idea is that you don't need to remember these initialisation value just because they are already right.

However, in your code, the behaviour is undefined. Let's see why...

Let's assume your initial code is this:

const int num_steps = 100000;
double x, sum, dx = 1./num_steps;

sum = 0.;
for (int i=0; i<num_steps; ++i) {
    x = (i+0.5)*dx;
    sum += 4./(1.+x*x);
}

Well, the "normal" way of parallelising it with OpenMP would be:

const int num_steps = 100000;
double x, sum, dx = 1./num_steps;

sum = 0.;
#pragma omp parallel for reduction(+:sum) private(x)
for (int i=0; i<num_steps; ++i) {
    x = (i+0.5)*dx;
    sum += 4./(1.+x*x);
}

Pretty straightforward, isn't it?

Now, when instead of that, you do:

const int num_steps = 100000;
double x, sum, dx = 1./num_steps;

#pragma omp parallel private(x) reduction(+:sum)
{
  sum = 0.;
  #pragma omp for schedule(static)
  for (int i=0; i<num_steps; ++i)
  {
    x = (i+0.5)*dx;
    sum += 4./(1.+x*x);
  }
}

You have a problem... The reason is that upon entry into the parallel region, sum hadn't been initialised. So when you declare omp parallel reduction(+:sum), you create a per-thread private version of sum, initialised to the "logical" initial value corresponding to the operator of you reduction clause, namely 0 here because you asked for a + reduction. And upon exit of the parallel region, these local values will be reduced using the + operator, and with the initial value of the variable, prior to entering the section. See this for reference:

The reduction clause specifies a reduction-identifier and one or more list items. For each list item, a private copy is created in each implicit task or SIMD lane, and is initialized with the initializer value of the reduction-identifier. After the end of the region, the original list item is updated with the values of the private copies using the combiner associated with the reduction-identifier

So in summary, upon exit you have the equivalent of sum += sum_local_0 + sum_local_1 + ... sum_local_nbthreadsMinusOne

Therefore, since in your code, sum doesn't have any initial value, its value upon exit of the parallel region isn't defined as well, and can be whatever...

Now let's imagine you did indeed initialise it... Then, if instead of using the right initialiser inside the parallel region (like your sum=0.; in the hereinabove code), you used for whatever reason sum=1.; instead, then the final sum won't be just incremented by 1, but by 1 times the number of threads used inside the parallel region, since the extra value will be counted as many times as there are of threads.

So in conclusion, just use reduction clauses and variables the "expected"/"naïve" way, that will spare you and the people coming after for maintaining your code a lot of troubles.


Edit: It looks like my point was not clear enough, so I'll try to explain it better:

this code:

int sum;
#pragma omp parallel reduction(+:sum)
{
  sum = 1;
}
printf("Reduction sum = %d\n",sum);

Has an undefined behaviour because it is equivalent to:

int sum, numthreads;
#pragma omp parallel
#pragma omp single
numthreads = omp_get_num_threads();

sum += numthreads; // value of sum is undefined since it never was initialised
printf("Reduction sum = %d\n",sum);

Now, this code is valid:

int sum = 0; //here, sum has been initialised
#pragma omp parallel reduction(+:sum)
{
  sum = 1;
}
printf("Reduction sum = %d\n",sum);

To convince yourself, just read the snippet of the standard I gave:

After the end of the region, the original list item is updated with the values of the private copies using the combiner associated with the reduction-identifier

So the reduction uses the combination of the private reduction variables and the original value to perform the final reduction upon exit. So if the original value wasn't set, the final value is undefined as well. And that's not because for some reason your compiler gives you a value that seems right, that the code is right.

Is that clearer now?

Gilles
  • 9,269
  • 4
  • 34
  • 53
  • 2
    For sure *Why would you want to do that ?* – High Performance Mark Oct 06 '15 at 10:29
  • I think it would be illustrative and simpler if you had showed how one could do a reduction by hand by setting `double sum_local = 0.;` inside the parallel region making it clear it's private and that `sum` is shared. – Z boson Oct 06 '15 at 10:58
  • Deat Gilles, this is a very nice answer, but either I do not understand it well or you do not understand the issue. Your point is simple - I am not allowed to initialize reduction variable and that is why the compiler ignores my initialization. But then the question is following: at which point am I allowed to manipulate this variable then? I have added a piece of code to the question which shows that initialization of reduction variable works in a different way. I think you answer does not explain the difference in 2 behaviors. – John Smith Oct 06 '15 at 10:58
  • Dear Z boson, see the second example, which I have added to the question - there is a clear difference in 2 behaviors. – John Smith Oct 06 '15 at 11:02
  • No, my point isn't that you cannot initialise the value the way you want, my point is that this is error prone... The obvious example is the one you gave yourself because you have to initialise **twice** the value: once outside the `parallel` region and once inside. That by wanting to force the second initialisation is dangerous because 1/ you might get it wrong and 2/ you might forget to do the first one (the outside one) as well – Gilles Oct 06 '15 at 11:04
  • Are you suggesting to initialize a reduction variable outside a parallel clause? This is nonsense. When I am in doubt, I open [this reference](https://computing.llnl.gov/tutorials/openMP/#REDUCTION) and read about reduction variable. And people, who understand the subject more or less, write that the reduction variable is, in fact, a private variable, therefore it is pointless to initialize it outside parallel clause, because the standard says that private variables do not retain its values when entering parallel region. – John Smith Oct 06 '15 at 11:16
  • I have tested your assumption that the reduction is done using the initial value **prior** to entering parallel section. It is wrong. The code below gives just bad value of `pi`, instead of giving `pi+100` _sum = 100.; #pragma omp parallel private(x) reduction(+:sum) { #pragma omp for schedule(static) for (int i=0; i – John Smith Oct 06 '15 at 11:50
  • sorry, it's not random, it is always a badly calculated pi of `3.142592653598146` – John Smith Oct 06 '15 at 12:04
  • 1
    Well, for me the code gives `314159.265360` with sum initialised to 0 and `314259.265360` with sum initialised to 100. Neither of the two is _pi_, but the former is about 100000x _pi_ and the later is exactly the former plus 100... I guess you do not print `sum`, but `sum*dx` and my point is valid, since in this case, your results just comply with mine. And that'll be my last comment or contribution to this post. – Gilles Oct 06 '15 at 12:18