The problem with parallelizing the program using OpenMP

Question

I have a program that works with arrays and outputs a single number. To parallelize the program, I use OpenMP, but the problem is that after writing the directives, I started getting answers that are not similar to the answers of the program without parallelization. Can anyone tell me where I made a mistake?

#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
#include <math.h>
#include <float.h>

#define LOOP_COUNT 100
#define MIN_1_ARRAY_VALUE 1
#define MAX_1_ARRAY_VALUE 10
#define MIN_2_ARRAY_VALUE 10
#define MAX_2_ARRAY_VALUE 100
#define PI 3.1415926535897932384626433
#define thread_count 4

double generate_random(unsigned int* seed, int min, int max) {
    return (double)((rand_r(seed) % 10) + (1.0 / (rand_r(seed) % (max - min + 1))));
}
double map_1(double value) {
    return pow(value / PI, 3);
}
double map_2(double value) {
    return fabs(tan(value));
}
void dwarf_sort(int n, double mass[]) {
    int i = 1;
    int j = 2;
    while (i < n) {
        if (mass[i-1]<mass[i]) {
            i = j;
            j = j + 1;
        }
        else
        {
            double tmp = mass[i];
            mass[i] = mass[i - 1];
            mass[i - 1] = tmp;
            --i;
            if (i==0)
            {
                i = j;
                j = j + 1;
            }
        }
    }
}
int main(int argc, char* argv[]) {
    printf("lab work 3 in processing...!\n");

    int trial_counter;
    int array_size = atoi(argv[1]);

    struct timeval before, after;
    long time_diff;
    gettimeofday(&before, NULL);

    for (trial_counter = 0; trial_counter < LOOP_COUNT; trial_counter++) {
        double arr1[array_size];
        double arr2[array_size / 2];
        double arr2_copy[array_size / 2];
        double arr2_min = DBL_MAX;
        unsigned int tempValue = trial_counter;
        unsigned int *currentSeed = &tempValue;
        //stage 1 - init

#pragma omp parallel num_threads(thread_count)
        {

#pragma omp parallel for default(none) shared(arr1, currentSeed, array_size) schedule(guided, thread_count)
            for (int i = 0; i < array_size; i++) {
                arr1[i] = generate_random(currentSeed, MIN_1_ARRAY_VALUE, MAX_1_ARRAY_VALUE);
                // printf("arr[%d] = %f\n", i, arr1[i]);
            }
#pragma omp parallel for default(none) shared(arr2, arr2_copy, array_size, currentSeed, arr2_min) schedule(guided, thread_count)
            for (int i = 0; i < array_size / 2; i++) {
                double value = generate_random(currentSeed, MIN_2_ARRAY_VALUE, MAX_2_ARRAY_VALUE);
                arr2[i] = value;
                arr2_copy[i] = value;
                if (value < arr2_min) {
                    arr2_min = value;
                }
            }
#pragma omp parallel for default(none) shared(arr1, array_size) schedule(guided, thread_count)
            for (int i = 0; i < array_size; i++) {
                arr1[i] = map_1(arr1[i]);
            }
#pragma omp parallel for default(none) shared(arr2, arr2_copy, array_size) schedule(guided, thread_count)
            for (int i = 1; i < array_size / 2; i++) {
#pragma omp critical
                arr2[i] = map_2(arr2_copy[i] + arr2_copy[i - 1]);
            }
#pragma omp parallel for default(none) shared(arr2, arr1, array_size) schedule(guided, thread_count)
            for (int i = 0; i < array_size / 2; i++) {
                arr2[i] = pow(arr1[i], arr2[i]);
            }
#pragma omp parallel sections
            {
#pragma omp section
                {
                    dwarf_sort((int) array_size / 2, arr2);
                }
            }
            double final_sum = 0;
            for (int i = 0; i < array_size / 2; i++) {
                if (((int) arr2[i]) / 2 == 0) {
                    final_sum += sin(arr2[i]);
                }
            }
            // printf("Iteration %d, value: %f\n", trial_counter, final_sum);
        }
    }
    gettimeofday(&after, NULL);
    time_diff = 1000 * (after.tv_sec - before.tv_sec) + (after.tv_usec - before.tv_usec) / 1000;
    printf("\nN=%d. Milliseconds passed: %ld\n", array_size, time_diff);
    return 0;
}

Removing the C++ tag, as this is pure C (these are separate languages, and whilst OpenMP exists in both worlds, things would, from a design point of view, be handled differently than in your program). — Marcus Müller, May 29 '22 at 14:14
oh, and by the way, `rand_r` is both not very fast and a pretty bad random generator. Also, the modulo method to limit the range of generated numbers will *not* yield good uniform samples. No offense (nothing that is usually well-taught!), but that `generate_random` is a bit of a "how to not do it" :) It doesn't matter much here, though (because the range is small, and your need for good randomness probably not very prominent) — Marcus Müller, May 29 '22 at 14:18
@MarcusMüller. Thanks, I'll remember. But I need to use rand_r by assignment. The essence of the generator itself is to obtain pseudo-random numbers in order to check in the future whether the program is working correctly on one or more processors, as well as to check its execution speed. — Dmitrii Shurygin, May 29 '22 at 14:27
As said, it's fine here :) It's more that this is a fairly complex program and I'm having a hard time understanding how it's parallelized. — Marcus Müller, May 29 '22 at 14:29
FYI [How to generate random numbers in parallel?](https://stackoverflow.com/q/4287531/10107454) — paleonix, May 30 '22 at 16:24
`omp parallel sections` with only one section doesn't make sense to me. — paleonix, May 30 '22 at 16:25
Could you try to write better questions? 1. "the problem" is redundant because if you didn't have a problem you wouldn't be posting here 2 "with OpenMP" is almost redundant since you include the openmp tag, but if it's not, "with parallelizing" is redundant. 3 "the program" is very likely redundant too. If you do parallel programming there is always a program involved. How about "why do I get different random numbers in parallel"? — Victor Eijkhout, May 30 '22 at 18:12

score 2 · Answer 1 · answered May 29 '22 at 15:49

rand_r is thread-safe only if each thread have its own seed or if threads are guaranteed to operate on the seed in an exclusive way (eg. using an expensive critical section). This is not the case in your code. Indeed, currentSeed is shared between thread. Thus, it causes a race condition since multiple threads can mutate it simultaneously. You need to use a thread-private seed (with a different value so for results not to be deterministic between threads). The seed of each thread can be initialized from a shared array filled from the main thread (eg. naively [0, 1, 2, 3, etc.]).

The thing is you will still get different results suing a different number of threads with this approach. One solution is to split your data set is independent chunks with an associated seed and then possibly compute the chunk in parallel.

Note that using a #pragma omp parallel for in a #pragma omp parallel section causes many threads to be created (ie. over-subscription). This is generally very inefficient. You should use #pragma omp for instead.

The problem with parallelizing the program using OpenMP

1 Answers1