Why is my Merge Sort with threads slower than basic recursive Merge Sort?

Question

I am a student and I need to implement MergeSort using threads. When I test the program, sorting with threads takes equal or more time, compared to recursive sorting.

We got premade file with basic MergeSort, and I am not allowed to change merge() or mergesort() fucntion, but I need to implement submergeSortThreads(). I am counting threads with semaphore(user decides how many threads will be made during the sorting), without semaphore array doesn't get sorted right, because of the race condition I think...

I'm running the program and reading numbers from file, with this command:

$ time cat rand_stevila.bin | ./mergeSort 10000

Threaded run (n == number of max threads allowed):

$ time cat rand_stevila.bin | ./mergeSort 10000 -t -n 5

Here are some parts of the code, that are important for the question:

    sem_t *sem_c;
    int *polje;
    unsigned int max_paralelizacij = 1000;

    struct{
        int *array;
        int min;
        int max;
        void(*function)(int *, int, int, int, int);
    }typedef thread_args;

    void* thread_fun(void* args){
        thread_args *thr_arg = (thread_args*)args;
        mergeSort((*thr_arg).array,(*thr_arg).min,(*thr_arg).max,(*thr_arg).function);
        return 0;
    }


    // simple recursive sort
    void submergeSortSimple(int* array, int min1, int max1, int min2, int max2){
        mergeSort(array, min1, max1, submergeSortSimple);
        mergeSort(array, min2, max2, submergeSortSimple);
    }

    //function that swaps submergeSortSimple(), while sorting with threads
    void submergeSortThread(int* array, int min1, int max1, int min2, int max2){
        if(sem_trywait(sem_c)==-1){ //if there is already max_threads, program continues with simple sorting
            mergeSort(array, min1, max1, submergeSortSimple);
            mergeSort(array, min2, max2, submergeSortSimple);
            return;
        }
            thread_args arg1;
            arg1.array = array;
            arg1.min = min1;
            arg1.max = max1;
            arg1.function = submergeSortThread;
            pthread_t tid1;
            pthread_create(&tid1, 0, &thread_fun, &arg1);

            if(sem_trywait(sem_c)==-1){
                pthread_join(tid1,NULL);
                mergeSort(array, min2, max2, submergeSortSimple);
                return;
            }
            pthread_t tid2;
            thread_args arg2;
            arg2.array = array;
            arg2.min = min2;
            arg2.max = max2;
            arg2.function = submergeSortThread;
            pthread_create(&tid2, 0, &thread_fun, &arg2);
            pthread_join(tid1,NULL);
            pthread_join(tid2,NULL);
            return;            
    }



    // must not be changed
    void mergeSort(int *array, int min, int max, void(*submergeSort)(int *, int, int, int, int) ){
        int mid;
        if(min < max){
            mid=(min+max)/2;
            submergeSort(array, min, mid, mid+1, max);

            merge(array, min, mid, max);
        }
    }

    // must not be changed
    void merge(int *arr, int min,int mid,int max)
    {
        // drugi korak algoritma mergeSort
        int *tmp = malloc((max-min+1)*sizeof(int));
        int i,j,k,m;
        j=min;
        m=mid+1;

        for(i=min; j<=mid && m<=max ; i++)
        {
            if(arr[j]<=arr[m])
            {
                tmp[i-min]=arr[j];
                j++;
            }        
            else
            {
                tmp[i-min]=arr[m];
                m++;
            }
        }
        if(j>mid)
        {
            for(k=m; k<=max; k++)
            {
                tmp[i-min]=arr[k];
                i++;
            }
        }
        else
        {
            for(k=j; k<=mid; k++)
            {
                tmp[i-min]=arr[k];
                i++;
            }
        }
        for(k=min; k<=max; k++){
            arr[k]=tmp[k-min];
        }
        free(tmp);
    }

    int main(int argc, char *argv[])
    {
        #define NO_PAR 0 
        #define THREAD_PAR 2
        int technique= NO_PAR;
        void (*submergeSortFun)(int *, int, int, int, int);
        submergeSortFun = submergeSortSimple;
        while(1){
            int c;
            c = getopt(argc, argv, "ptn:");
            if(c==-1){
                break;
            }
            switch(c){

                case 't':
                    technique = THREAD_PAR;
                    submergeSortFun = submergeSortThread;
                    break;
                case 'n':
                    max_paralelizacij = atoi(optarg);                  
                    break;
                default:
                    printHelp(argc, argv);
                    return 0;
            }
        }
        if(technique == THREAD_PAR){
            sem_unlink("/C\0");
            sem_c = sem_open("/C", O_RDWR|O_CREAT|O_EXCL, 0660, max_paralelizacij);
            if(sem_c == SEM_FAILED){
                perror("Napaka pri ustvarjanju semaforja!\n");
            }
        }

        int i;
        int size;
        int *arr;
        if(optind >= argc){
            printHelp(argc, argv);
            return -1;
        }

        size = atoi(argv[optind]);


        // TODO: inicializacija za razlicne tehnike 
        switch(technique){
            case NO_PAR:
                arr = malloc(sizeof(int)*size);
                break;

            case THREAD_PAR:
                arr = malloc(sizeof(int)*size);
                break;
        }
        char buffer[101];

        // reading numbers from file
        for(i=0; i < size; i+=1){
            read(0, &arr[i], 4);
        }

        //measure time and sorting
        clock_t start = clock();
        mergeSort(arr, 0, size-1, submergeSortFun);
        clock_t end = clock();
        float seconds = (float)(end - start) / CLOCKS_PER_SEC;


        for(i=0; i<size; i++){

            printf("%d ",arr[i]);

        }
        printf("\n");



        // TODO: ciscenje za razlicnimi tehnikami
        switch(technique){
            case NO_PAR:
                free(arr);
                sem_close(sem_c);
                break;
            case THREAD_PAR:
                free(arr);
                sem_close(sem_c);
                break;
        }
                printf("\n\n CAS : %f\n\n",seconds);

        return 0;
    }

About how much data to sort are we talking? Using threads comes with quite some overhead, the thread must be created with all its facilities and destroyed after running. You'll only profit from multiple threads if there is enough data such that speeding up can compensate the thread management costs... — Aconcagua, Nov 29 '19 at 12:44
Well `10000` is kinda small for array size so overhead of all thread creation might be more than actual sorting, try with larger arrays, something like `10000000` — Photon, Nov 29 '19 at 12:44
I'm sorting 100 1k 10k 100k and 1M number, sorting with threads is always slower, except in 1M number I get 1-5ms faster sorting. On the other hand, sorting with fork() (processes) I can sort much faster... I am also running the program on virtualbox using 4GB RAM and 2 CPE, if that can affect the spped of sorting? — Aljaž Gornik, Nov 29 '19 at 12:48
Offtopic: atoi usually is a bad choice for converting string to integers, as you cannot distinguish invalid input from a legally provided zero value. In given case, we might assume zero being invalid input anyway – but you should at least check (and for negative values, too, as you don't use unsigned int...). — Aconcagua, Nov 29 '19 at 12:48
Well, with 1M, you finally passed the point where the costs for creating the threads got amortised... On a POSIX system, threads and processes actually differ very little. Sorting with fork shouldn't have given you any speed-up compared to threads *unless* you did something fundamentally wrong. Apart from, these different processes use their own memory each (data to be sorted being copied), so you'll have difficulties to re-combine the results... — Aconcagua, Nov 29 '19 at 12:57
regarding; `struct{ int *array; int min; int max; void(*function)(int *, int, int, int, int); }typedef thread_args;` Your compiler should have told you this has the wrong syntax. Also, an anonymous struct will be a problem when debugging as most debuggers cannot display the individual fields without have a 'tag' name for the struct — user3629249, Nov 30 '19 at 03:42
there is absolutely no difference in the actions controlled by `NO_PAR` and `THREAD_PAR` so why have both? — user3629249, Nov 30 '19 at 03:54
As for your `time` readings, they are likely to be dominated by I/O. You are using a terribly inefficient method to read the data. Ditch the loop, use a single call to `read` for the entire array. — n. m. could be an AI, Nov 30 '19 at 08:14

score 0 · Answer 1 · answered Nov 29 '19 at 14:01

Let's observe the cost of creating a thread vs calling a function recursively.

When you create a thread, you create and initialize a separate Call Stack and Program Counter for that thread.

When you call a function, it adds to the top of your current call stack. It is a single entry for a single call. No separate Program Counter is needed.

Clearly, the overhead of creating a thread is much more than calling a function. Hence, despite order of running time being the same for both implementations, threaded is usually slower due to large overhead of creating a thread. A reasonable situation to use threads would be to divide a large workload and then run recursive cases for smaller arrays to prevent the Call Stacks from exploding in size.

Why is my Merge Sort with threads slower than basic recursive Merge Sort?

1 Answers1