1

I want to parallelize a for loop which contains a nested comparison function for qsort:

#include    <stdio.h>
#include    <stdlib.h>
#include    <omp.h>

int main(){
    int i;
#pragma omp parallel for
    for(i = 0; i < 100; i++){
        int *index= (int *) malloc(sizeof(int)*10);
        double *tmp_array = (double*) malloc(sizeof(double)*10);
        int j;
        for(j=0; j<10; j++){
            tmp_array[j] = rand();
            index[j] = j;
        }
        // QuickSort the index array based on tmp_array:
        int simcmp(const void *a, const void *b){
            int ia = *(int *)a;
            int ib = *(int *)b;
            if ((tmp_array[ia] - tmp_array[ib]) > 1e-12){
                return -1;
            }else{
                return 1;
            }
        }
        qsort(index, 10, sizeof(*index), simcmp);
        free(index);
        free(tmp_array);
    }
    return 0;
}

When I try to compile this, I get the error:

internal compiler error: in get_expr_operands, at tree-ssa-operands.c:881
 }

As far as I can tell, this error is due to the nested comparison function. Is there a way to make openmp work with this nested comparison function? If not, is there a good way to achieve a similar result without a nested comparison function?

Edit: I'm using GNU C compiler where nested functions are permitted. The code compiles and runs fine without the pragma statement. I can't define simcmp outside of the for loop because tmp_array would then have to be a global variable, which would mess up the multi-threading. However, if somebody has a suggestion to achieve the same result without a nested function, that would be most welcome.

Agargara
  • 902
  • 11
  • 24
  • 2
    What is the purpose of nesting that function in that way? Why don't you define the function outside of other blocks? – Yunnosch Jan 09 '19 at 08:51
  • You want to sort `index` based on tmp_array values, but `index` is not initialized – Mathieu Jan 09 '19 at 09:04
  • Note that nested function definitions aren't permitted in standard C. – Cubic Jan 09 '19 at 15:04
  • @Yunnosch If you define the comparison function outside, tmp_array would need to be a global variable, which would screw up the multi-threading. – Agargara Jan 10 '19 at 00:06
  • @Mathieu sorry, my original code initializes index. I'll edit the question. – Agargara Jan 10 '19 at 00:06
  • @HighPerformanceMark code compiles fine without the openmp. – Agargara Jan 10 '19 at 00:06
  • The strangely named `tmp_array` variable is a single pointer, which can easily be given as an additional parameter. What else keeps you from avoiding compiler specific extensions? – Yunnosch Jan 10 '19 at 00:20
  • @Yunnosch are you saying there's a way to pass `tmp_array` to simcmp or qsort? How exactly would one do this? – Agargara Jan 10 '19 at 00:23
  • `double *tmp_array = (double*) malloc(sizeof(double)*10);` So it is a single variable of pointer to double type. I really do not get what your problem is. (you can and should skip the casting by the way.) – Yunnosch Jan 10 '19 at 01:01
  • @Yunnosch The problem is the comparison function needs to be able to see `tmp_array`. If I move the comparison function outside of the for loop, `tmp_array` will no longer be in scope. – Agargara Jan 10 '19 at 04:25
  • The parameter you give to a function will always be visible to the function. How could it not be. – Yunnosch Jan 10 '19 at 09:54
  • @Yunnosch I think you are fundamentally misunderstanding the problem. With qsort, you can only pass one pointer: the array to be sorted, in this case `index`. I would also need to pass a pointer to `tmp_array` because I am sorting `index` based on the corresponding values in `tmp_array`. Hence my solution below which uses qsort_r, which allows you to pass a pointer to additional data. – Agargara Jan 11 '19 at 10:11
  • @Yunnosch this github page summarizes the problem https://github.com/noporpoise/sort_r: "If you want to qsort() an array with a comparison operator that takes parameters you need to use global variables to pass those parameters (not possible when writing multithreaded code), or use qsort_r/qsort_s which are not portable (there are separate GNU/BSD/Windows versions and they all take different arguments)." – Agargara Jan 11 '19 at 10:33

2 Answers2

2

I realize this has been self answered, but here are some standard C and OpenMP options. The qsort_r function is a good classic choice, but it's worth noting that qsort_s is part of the c11 standard, and thus is portable wherever c11 is offered (which does not include Windows, they don't quite offer c99 yet).

As to doing it in OpenMP without the nested comparison function, still using original qsort, there are two ways that come to mind. First is to use the classic global variable in combination with OpenMP threadprivate:

static int *index = NULL;
static double *tmp_array = NULL;

#pragma omp threadprivate(index, tmp_array)

int simcmp(const void *a, const void *b){
    int ia = *(int *)a;
    int ib = *(int *)b;
    double aa = ((double *)tmp_array)[ia];
    double bb = ((double *)tmp_array)[ib];
    if ((aa - bb) > 1e-12){
        return -1;
    }else{
        return 1;
    }
}

int main(){
    int i;
#pragma omp parallel for
    for(i = 0; i < 100; i++){
        index= (int *) malloc(sizeof(int)*10);
        tmp_array = (double*) malloc(sizeof(double)*10);
        int j;
        for(j=0; j<10; j++){
            tmp_array[j] = rand();
            index[j] = j;
        }
        // QuickSort the index array based on tmp_array:
        qsort_r(index, 10, sizeof(*index), simcmp, tmp_array);
        free(index);
        free(tmp_array);
    }
    return 0;
}

The version above causes every thread in the parallel region to use a private copy of the global variables index and tmp_array, which takes care of the issue. This is probably the most portable version you can write in standard C and OpenMP, with the only likely incompatible platforms being those that do not implement thread local memory (some microcontrollers, etc.).

If you want to avoid the global variable and still have portability and use OpenMP, then I would recommend using C++11 and the std::sort algorithm with a lambda:

std::sort(index, index+10, [=](const int& a,  const int& b){
    if ((tmp_array[a] - tmp_array[b]) > 1e-12){
        return -1;
    }else{
        return 1;
    }
});
Tom Scogland
  • 937
  • 5
  • 12
  • Thank you for providing some good alternative solutions. Would using `threadprivate` negatively impact performance compared to qsort_r or the C++11 solution? – Agargara Jan 13 '19 at 03:35
  • 1
    Accessing a thread private does incur some cost, depending on the implementation that can be calculating an offset or equivalent to an indirect call. Either way while it’s measurable, it’s at worst similar to the cost classic qsort has for using function pointers to the comparator. The C++ solution, since all calls are static, should be able to optimize out all the calls entirely. All algorithms and else being equal the C++ version should perform best, with the qsort variants all within a small margin of each other on most platforms. – Tom Scogland Jan 13 '19 at 04:28
  • I implemented `std::sort` in my program, and noticed that the comparison function caused my program to hang because it did not provide a strict weak ordering, which std::sort requires. (See https://stackoverflow.com/questions/18291620/why-will-stdsort-crash-if-the-comparison-function-is-not-as-operator) Changing the comparison function to `return -1 * ((tmp_array[a] - tmp_array[b]) > 1e-12);` fixed this problem. In terms of performance, I found that `std::sort` was not noticeably faster than `qsort_r` even when calling `sort` 100,000 times. – Agargara Jan 15 '19 at 05:07
1

I solved my problem with qsort_r, which allows you to pass an additional pointer to the comparison function.

#define _GNU_SOURCE
#include    <stdio.h>
#include    <stdlib.h>
#include    <omp.h>

int simcmp(const void *a, const void *b, void *tmp_array){
    int ia = *(int *)a;
    int ib = *(int *)b;
    double aa = ((double *)tmp_array)[ia];
    double bb = ((double *)tmp_array)[ib];
    if ((aa - bb) > 1e-12){
        return -1;
    }else{
        return 1;
    }
}

int main(){
    int i;
#pragma omp parallel for
    for(i = 0; i < 100; i++){
        int *index= (int *) malloc(sizeof(int)*10);
        double *tmp_array = (double*) malloc(sizeof(double)*10);
        int j;
        for(j=0; j<10; j++){
            tmp_array[j] = rand();
            index[j] = j;
        }
        // QuickSort the index array based on tmp_array:
        qsort_r(index, 10, sizeof(*index), simcmp, tmp_array);
        free(index);
        free(tmp_array);
    }
    return 0;
}

This compiles and runs with no issue. However, it is not completely ideal as qsort_r is platform and compiler dependent. There is a portable version of qsort_r here where the author summarizes my problem nicely:

If you want to qsort() an array with a comparison operator that takes parameters you need to use global variables to pass those parameters (not possible when writing multithreaded code), or use qsort_r/qsort_s which are not portable (there are separate GNU/BSD/Windows versions and they all take different arguments).

Agargara
  • 902
  • 11
  • 24