Simple parallel mergesort in C using hypercube structure

Question

I am attempting to write a simple parallel mergesort algorithm to help me understand the concept of hypercube communication. I have a C program that generates an array of random integers and assigns threads to sort the data. Each thread serially sorts its sublist of the data and then performs compare-exchange operations with the other threads to merge the sublists. Below is the thread function.

void *th_fun(void* args)
{
    int i, j;
    int my_id             = *((int*)args);

    int chunk_size        = N_DATA/N_THREADS;
    int start_index       = my_id*chunk_size;

    //set up the memory this thread will use
    th_mem[my_id].my_data = &data[start_index];
    th_mem[my_id].partner_data = malloc(sizeof(int)*chunk_size);
    for(i = 0; i < chunk_size; i++)
        th_mem[my_id].partner_data [i] = 0;

    //sort my data serially, should use a mergesort
    qsort((void*)th_mem[my_id].my_data, chunk_size, sizeof(int), compare);

    //wait for all threads to finish sorting
    barrier_simple();

    //compare-exchange accross hypercube
    int partner_id, op;
    int logth = log(N_THREADS)/log(2);
    for(i = 0; i < logth; i++)
    {
        partner_id = my_id ^ (int)pow(2,logth-i-1);

        //send my data to partner
        for(j = 0; j < chunk_size; j++)
        {
            th_mem[partner_id].partner_data[j] = th_mem[my_id].my_data[j];
        }

        //wait for all data to be sent
        barrier_simple();

        //get the lowest or highest elements between my_data, partner_data
        int x = my_id & (int)pow(2,logth-i-1);
        op = (x > 0 ? 1:-1);
        int* output = merge(th_mem[my_id].my_data, th_mem[my_id].partner_data, 
                            chunk_size, op);

        //save results
        for(j = 0; j < chunk_size; j++)
        {
            th_mem[my_id].my_data[j] = output[j];
        }
        free(output);

        //wait for all data to be merged
        barrier_simple();
    }

    free(th_mem[my_id].partner_data);
    pthread_exit(args);
}

Sample run with 12 data elements sorted by 4 threads(output is chunked to indicate which thread is responsible for that data):

./parallel_merge 12 4
Unsorted:
0:924 674 704 
1:495 554 645 
2:979 285 119 
3:878 80 620 

Serial sort: 
0:80 119 285 
1:495 554 620 
2:645 674 704 
3:878 924 979 

Parallel sort: 
0:80 119 285 
1:495 554 674 
2:620 645 704 
3:878 924 979

As you can see the parallel sort is not entirely correct(aside from performance considerations, ofc). I have done many gdb runs and cannot determine why the hypercube exchanges are not working out. Any help would be appreciated. My references were The Hypercube Template and Mergesort

Simple parallel mergesort in C using hypercube structure

0 Answers0