2

I am use OpenMP to parallize a for loop like so

std::stringType = "somevalue";
#pragma omp parallel for reduction(+ : stringType)
//a for loop here which every loop appends a string to stringType

The only way I can think to do this is to convert to an int representation in some way first and then convert back at the end but this has obvious overhead. Is there any better ways to perform this style of operation?

  • 1
    Provide more code, please. It is not clear how we can help you – Dmytro Dadyka Dec 16 '18 at 21:18
  • 1
    What you want is not possible. The fact is that strings concatenation is noncommutative and depends on order. You cannot control the execution order of threads. Tell please what you want to do. – Dmytro Dadyka Dec 16 '18 at 21:26
  • 2
    @DmytroDadyka Actually, you don't need commutativity to perform a parallel reduction. You just need associativity. String concatenation is associative. However, string concatenation tends to allocate memory and *that* can trash any benefit you may hope from parallelism. – CygnusX1 Dec 16 '18 at 23:04
  • @CygnusX1 Without commutativity, the final result is undefined. If during the reduction you will merge a, b, c you can get "abc", "acb" etc. Or am I wrong? If this uncertainty is not important to the author, then it is not clear why he needs strings at all. – Dmytro Dadyka Dec 16 '18 at 23:17
  • Possible duplicate of [Elegant (and typical) workaround for OpenMP reduction on complex variable in C++?](https://stackoverflow.com/questions/7163562/elegant-and-typical-workaround-for-openmp-reduction-on-complex-variable-in-c) – Dmytro Dadyka Dec 16 '18 at 23:43
  • It looks like there is a way out for some compilers. I found an example https://gist.github.com/eruffaldi/7180bdec4c8c9a11f019dd0ba9a2d68c – Dmytro Dadyka Dec 16 '18 at 23:52
  • @DmytroDadyka The reduction algorithm I know groups *neighboring* elements in pairs and sums/concats independent pairs in parallel, while dependent results remain sequential. In `((a.b).(c.d))` (where `.` is the concatenation operator) I don't care if `(a.b)` was done before or after `(c.d)` - this can be parallelised. But final concatenation has to wait for its partial results. I will always get `abcd` and just that. – CygnusX1 Dec 17 '18 at 10:38
  • You need to set your own schedule manually, otherwise you will have no guarantee that a thread loops through - and concatenates - neighboring elements. – Brice Dec 17 '18 at 12:04
  • @CygnusX1- uncertainity is okay with me but while compilation reduction of string gives error - not allowed on gcc/9.2.0 – kil47 May 17 '23 at 07:53

1 Answers1

3

As mentioned in comments, reduction assumes that the operation is associative and commutative. The values may be computed in any order and be "accumulated" through any kind of partial results and the final result will be the same.

There is no guarantee that an OpenMP for loop will distribute contiguous iterations to each thread unless the loop schedule explicitly requests that. There is no guarantee either that continuous blocks will be distributed by increasing thread number (i.e. thread #0 might go through iterations 1000-1999 while thread #1 goes through 0-999). If you need that behavior, then you should define you own schedule.

Something like:

int N=1000;
std::string globalString("initial value");

#pragma omp parallel shared(N,stringType)
{
    std::string localString; //Empty string

    // Set schedule
    int iterTo, iterFrom;
    iterFrom = omp_get_thread_num() * (N / omp_get_num_threads());
    if (omp_get_num_threads() == omp_get_thread_num()+1)
        iterTo =  N;
    else
        iterTo = (1+omp_get_thread_num()) * (N / omp_get_num_threads());

    // Loop - concatenate a number of neighboring values in the right order
    // No #pragma omp for: each thread goes through the loop, but loop
    // boundaries change according to the thread ID
    for (int ii=iterTo; ii<iterTo ; ii++){
        localString += get_some_string(ii);
    }

    // Dirty trick to concatenate strings from all threads in the good order
    for (int ii=0;ii<omp_get_num_threads();ii++){
        #pragma omp barrier
        if (ii==omp_get_thread_num())
            globalString += localString;
    }

}

A better way would be to have a shared array of std::string, each thread using one as a local accumulator. At the end, a single thread can run the concatenation part (and avoid the dirty trick and all its overhead-heavy barrier calls).

Brice
  • 1,560
  • 5
  • 10
  • "reduction assumes that the operation is associative and commutative" - is that an assumption of OpenMP so that it has more freedom in assigning work to threads? Because I can easily implement a reduction assuming only associativity, say - in CUDA, or with plain `std::thread` - without a problem. (Making the latter benefit from vector instructions may be hard, and probably unnecessary) – CygnusX1 Dec 21 '18 at 18:42
  • Let's say that there is a limited set of operators supported for reduction through the `reduction(op:var)` clause, and they are happen to be associative and commutative. Thanks to that, the schedule of a distributed for-loop can be chosen arbitrarily (and there is a whole set of clauses for that), and no synchronization will be required (but for a single atomic operation per thread). If you want to do something else, you'll have to do it manually. It is not difficult to implement a reduction, but it constrains the schedule and/or requires some synchronization beetewn threads – Brice Jan 02 '19 at 16:49