Scan for quicksort splitting

Question

I'm trying to write quicksort with OpenMP, and I'm getting stuck in the splitting part. The literature says that this is a parallel prefix operation. Here's the relevant bit:

  vector<int> under_pivot(values.size()), over_pivot(values.size());
  int count_under=0, count_over=0;
#pragma omp parallel for \
        reduction(+:count_under,count_over) \
        reduction(inscan,+:under_pivot,over_pivot)
  for (int i=0; i<values.size(); i++) {
    auto v = values[i];
#   pragma omp scan inclusive(under_pivot,over_pivot)
    {
      under_pivot[i] = count_under;
      over_pivot[i]  = count_over;
    }
    count_under += 1 ? v<pivot : 0;
    count_over  += 1 ? v>pivot : 0;
  }

(Yes, I've defined the + operator for vectors). The problem seems to be the double reduction, if I understand the compiler error correctly:

quicksort.cxx:59:9: error: expected 'reduction' clause with the 'inscan' modifier
        reduction(+:count_under,count_over) \
        ^
quicksort.cxx:60:19: note: 'reduction' clause with 'inscan' modifier is used here
        reduction(inscan,+:under_pivot,over_pivot)
                  ^
1 error generated.

But I really need two reductions because the scalars are not in a scan: a scan only pertains to arrays.

Compiler explorer with full code: https://godbolt.org/z/6xPrnGjMY

I haven't done a scan in OpenMP before, but you seem to be building an exclusive scan using the `inclusive` clause. Wouldn't it make more sense to use `exclusive` and then take `int count_under = under_pivot.back() + some_function(values.back())` at the end? Or did I overlook something? I don't quite understand your ternary comparisons. — paleonix, Jul 12 '22 at 14:52
@paleonix The problem is that I need the counts to fill in the scanned array. Hm. Maybe first make a boolean array, and use that to create the scanned array. — Victor Eijkhout, Jul 12 '22 at 15:39
So [this](https://godbolt.org/z/Yzh71hEne) is not what you want? I just applied the exclusive scan example on page 426 [here](https://www.openmp.org/wp-content/uploads/openmp-examples-5-2.pdf). — paleonix, Jul 12 '22 at 19:44
And I'm pretty sure that `count_under += 1 ? v < pivot : 0;` is the same as `count_under += v < pivot;`. — paleonix, Jul 12 '22 at 19:46
@paleonix I thought I'd be explicit about the conversion between bool and int. But yes. — Victor Eijkhout, Jul 12 '22 at 19:47
@paleonix Let me try your code. I do find that I've been getting the `inscan` stuff completely backwards. — Victor Eijkhout, Jul 12 '22 at 19:48
Then it would be `count_under += v < pivot ? 1 : 0;`, that is why I was confused. — paleonix, Jul 12 '22 at 19:48
Just from comparing the two examples (inclusive and exclusive) in the linked document I must say that it is certainly quite confusing (e.g. the placement of the `scan` construct). — paleonix, Jul 12 '22 at 19:50
@paleonix Ok, so your code is code. I mostly messed up the directive: the reduction is on the variables, not the arrays. Btw, did you see that the reduction variables are zero after the loop? — Victor Eijkhout, Jul 12 '22 at 20:42
No, I never executed anything. That is also some weird behavior, but as you can always get the reduction by reading the last element of the scan, it's ok as long as you know it? — paleonix, Jul 12 '22 at 20:45
@paleonix That's what I thought, but the last element is not always correct. If all elements are over the pivot, then the array `under_pivot` is filled with zeros, so you'd think that the last element is zero, so location zero will be filled with a — Victor Eijkhout, Jul 12 '22 at 21:17

Victor Eijkhout · Answer 1 · 2022-07-14T12:03:12.247

With thanks to @paleonix, here is the correct code:

  int count_under=0, count_over=0;
  vector<int> under_pivot(values.size(),-1), over_pivot(values.size(),-1);
#pragma omp parallel for \
        reduction(inscan,+:count_under,count_over)
  for (int i=0; i<values.size(); i++) {
    under_pivot[i] = count_under;
    over_pivot[i]  = count_over;
#   pragma omp scan exclusive(count_under, count_over)
    {
      count_under += values[i]<pivot ? 1 : 0;
      count_over  += values[i]>pivot ? 1 : 0;
    }
  }
  int s = under_pivot.size();
  count_under = under_pivot.back()
    + ( under_pivot[s-2]==under_pivot[s-1] ? 1 : 0 );

  cout << "Splitting with " << count_under << " before the pivot\n";
  cout << "under: " << under_pivot << "\n";
  cout << "over:  " << over_pivot << "\n";

  vector<float> newvalues(values.size(),-1);
  for (int i=0; i<values.size(); i++) {
    auto v = values[i];
    int loc;
    try {
      if (v<pivot) {
        loc = under_pivot[i] ;
        newvalues.at(loc) = v;
      } else {
        loc = over_pivot[i]  + count_under;
        newvalues.at(loc) = v;
      }
    } catch (...) {
      cout << "element " << i << ": " << v << " maps to loc " << loc << "\n";
    }
  }

This seems to work but there are several strange aspects. For instance, why are the reduction variables zero after the reduction loop? I tried an inclusive scan and got nothing but rubbish.

EDIT I have an extremely hackish solution for count_under which compensates for some weird border case, but that's because clang13 has a bug in scan handling. Let me find a better compiler.

Anyway, parallel quick sort, here we come.....

If I understand the standard right, the `scan` construct is not something that acts on the line/block below, but instead cuts the encompassing block into two blocks. Therefore the braces around the updates of `count_under` and `count_over` might not be needed. I also checked this in my godbolt sample and getting rid of the braces makes no difference in the compiled code. — paleonix, Jul 13 '22 at 13:11
See [here](https://godbolt.org/z/8EGjjWPWT). With this in mind the construct is also less confusing, i.e. `inclusive` or `exclusive` differ by which of the code above/below the pragma is the "reduction" and which is writing out the result. — paleonix, Jul 13 '22 at 13:16
@paleonix Thanks. I've since also confirmed that my version of clang 13 has a bug. The reduction variables should not be zero after the loop. Gcc12 has it fixed. I can not get godbolt to compile this with any clang. Undoubtedly that also explains my confusion about the exclusive code, because I certianly tried put ting the reduction after the scan. — Victor Eijkhout, Jul 13 '22 at 14:21

Scan for quicksort splitting

1 Answers1