0

I am doing multithreaded merge-sort with C++. The result is very strange, adding more threads decreases the running time even when the number to be sorted is very large (5 milliion). I expect that adding more threads will decrease the running time. I think there are some mistakes in my code. Could anyone help me?

    #include <iostream>
    #include <pthread.h>
    #include <stdio.h>
    #include <stdlib.h>
    #include <unistd.h>
    #include <time.h>

    // number of elements in array
    // number of threads

    int MAX;
    int THREAD_MAX;
    // array bellow is to store the right edge of a divided array
    int index[20] = {0};
    int p_i = 0;


    using namespace std;


    //int *a;
    int part = 0;


    void fillData(int a[], int len)
    {
        // Create random arrays
        int i;
        for (i = 0; i<len; i++)
            a[i] = rand();
        return;
    }


    // merge function for merging two parts
    void merge(int *a, int low, int mid, int high)
    {
        // We have low to mid and mid+1 to high already sorted.
        int i, j, k;
        int * temp = new int[high - low + 1];
        i = low;
        k = 0;
        j = mid + 1;

        // Merge the two parts into temp[].
        while (i <= mid && j <= high)
        {
            if (a[i] < a[j])
            {
                temp[k] = a[i];
                k++;
                i++;
            }
            else
            {
                temp[k] = a[j];
                k++;
                j++;
            }
        }

        // Insert all the remaining values from i to mid into temp[].
        while (i <= mid)
        {
            temp[k] = a[i];
            k++;
            i++;
        }

        // Insert all the remaining values from j to high into temp[].
        while (j <= high)
        {
            temp[k] = a[j];
            k++;
            j++;
        }


        // Assign sorted data stored in temp[] to a[].
        for (i = low; i <= high; i++)
        {
            a[i] = temp[i - low];
        }
    }




    // merge sort function
    void merge_sort(int *a, int low, int high)
    {
        // calculating mid point of array
        int mid = low + (high - low) / 2;
        if (low < high) {

            // calling first half
            merge_sort(a, low, mid);

            // calling second half
            merge_sort(a, mid + 1, high);

            // merging the two halves
            merge(a, low, mid, high);
        }
    }

    // thread function for multi-threading
    void* merge_sort(void* arr)
    {
        int *a = (int *)(arr);
        int thread_part = part++;

        // calculating low and high


        int low = thread_part * (MAX / THREAD_MAX);
        int high = (thread_part + 1) * (MAX / THREAD_MAX) - 1;

        // allocate the rest part of original array to the last thread
        if(thread_part == THREAD_MAX -1){
                high = MAX - 1;
        }
        // store the right edge of each divided array
        index[++p_i] = high;

        // evaluating mid point
        int mid = low + (high - low) / 2;

            merge_sort(a, low, mid);
            merge_sort(a, mid + 1, high);
            merge(a, low, mid, high);

    }


    void isSorted(int *a, int len)
    {
        if (len == 1)
        {
            printf("Sorting Done Successfully\n");
            return;
        }

        int i;
        for (i = 1; i<len; i++)
        {
            if (a[i]<a[i - 1])
            {
                printf("Sorting Not Done\n");
                return;
            }
        }
        printf("Sorting Done Successfully\n");
        return;
    }



    // Driver Code
    int main()
    {


    /*  cout << "Enter No of elements of Array:";
        cin >> MAX;*/
        int length, i;
        cout << "\nEnter the number of data element to be sorted: ";
        cin >> MAX;

        int *a= new int[MAX];
        // Create a random array of given length
        srand(time(NULL));
        fillData(a, MAX);



        cout << "Enter No of Thread:";
        cin >> THREAD_MAX;


    /*  // generating random values in array
        a = new int[MAX];
        srand(time(NULL));
        for (int i = 0; i < MAX; i++)
        {
            a[i] = rand();
        }*/

        // t1 and t2 for calculating time for merge sort

        clock_t t1 = clock();
        pthread_t threads[THREAD_MAX];

// creating threads
for (int i = 0; i < THREAD_MAX; i++)
{
    pthread_create(&threads[i], NULL, merge_sort, (void*)a);
}


    // joining all threads
    for (int i = 0; i < THREAD_MAX; i++)
        {
        pthread_join(threads[i], NULL);
        }
        clock_t t2 = clock();
        // merging the final parts
        int p = 1;
        int mid, high;
        for(int q = 1; q < THREAD_MAX; q++)
        {   

            mid = index[p];
            p++;
            high = index[p];
            merge(a, 0, mid, high);         
        }


        cout << "Time taken: " << (double)(t2 - t1)/ CLOCKS_PER_SEC << endl;
        isSorted(a, MAX);



        // displaying sorted array
        /*cout << "Sorted array: ";
        for (int i = 0; i < MAX; i++)
            cout << a[i] << " ";*/


        return 0;
    }
Uwe Keim
  • 39,551
  • 56
  • 175
  • 291
MegaMM
  • 31
  • 3
  • How many CPU cores ya got and how many threads are you starting? Threads are great until they start fighting it out with one another for CPU time. Too many threads on too few CPU cores and you'll wind up spending more time swapping tasks in and out than you will processing jobs. – user4581301 Feb 22 '18 at 06:46
  • Unrelated: `pthread_t threads[THREAD_MAX];` counts on your compiler supporting Variable Length Arrays. The C++ standard keeps leaving [VLA out for some very good reasons](https://stackoverflow.com/questions/1887097/why-arent-variable-length-arrays-part-of-the-c-standard). – user4581301 Feb 22 '18 at 06:51
  • 1
    I did this already. It was fun. Some friendly advice .. Don't use `new` or C arrays. Use `std::vector` instead Don't roll your own merge. Use `std::merge`. Don't use p_thread. Use `std::thread` and `join()`. Done correctly, the program should run not quite N times as fast using N threads. – Jive Dadson Feb 22 '18 at 07:05
  • 1
    Your solution is not scalable, since final `merge` is performed by a single thread only. That is where _parallel multi-way merge_ is useful, which is quite complex algorithm with some heuristics. (This is similar to parallel quicksort, you need parallel partitioning for scalability. And, parallel partitioning is anything but simple.) – Daniel Langr Feb 22 '18 at 11:26
  • 1
    Moreover, I guess there is a _data race_ in `int thread_part = part++;`. You need atomic memory operation here. I don't know how to use atomics in Pthreads, in C++11 its via `std::atomic`, in OpenMP via `#pragma omp atomic ...` directive. – Daniel Langr Feb 22 '18 at 11:29

1 Answers1

1

Don't allocate memory in merge, allocate one large array of size MAX in advance and use it within merge_sort.

Your merge function is called a lot of times, moreover from multiple threads. Each new modifies free store (heap), which is typically done in some form of critical section. There are scalable allocators, such as malloc from Intel TBB for this purpose. However, there is no reason for such allocation in mergesort implementation.

Moreover you don't delete[] temp in your merge function. There are many more problems with your code, such as data race (int thread_part = part++;). You perform final merge only sequentially, which hinders scalability. Efficient implementations of mergesort (and quicksort) typically uses insertion sort at the end of recursion (e.g., below 10 elements), since it is faster than calling a function. Etc...

Daniel Langr
  • 22,196
  • 3
  • 50
  • 93