My goal is to create a program that takes a large list of unsorted integers (1-10 million) and divides it into 6 parts where a thread concurrently sorts it. After sorting I merge it into one sorted array so I can find the median and mode quicker.
The input file will be something like this:
# 1000000
314
267
213
934
where the number following the # identifies the number of integers in the list.
Currently I can sort perfect and quickly without threading however when I began threading I ran into an issue. For a 1,000,000 data set it only sorts the first 833,333 integers leaving the last 166,666 (1/6) unsorted.
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#include <time.h>
#define BUF_SIZE 1024
int sum; /* this data will be shared by the thread(s) */
int * bigArr;
int size;
int findMedian(int array[], int size)
{
if (size % 2 != 0)
return array[size / 2];
return (array[(size - 1) / 2] + array[size / 2]) / 2;
}
/*compare function for quicksort*/
int _comp(const void* a, const void* b) {
return ( *(int*)a - *(int*)b);
}
/*This function is the problem method*/
/*indicate range of array to be processed with the index(params)*/
void *threadFct(int param)
{
int x= size/6;
if(param==0)x= size/6;
if(param>0&¶m<5)x= (size/6)*param;
if(param==5)x= (size/6)*param+ (size%size/6);/*pass remainder into last thread*/
qsort((void*)bigArr, x, sizeof(bigArr[param]), _comp);
pthread_exit(0);
}
int main(int argc, char *argv[])
{
FILE *source;
int i =0;
char buffer[BUF_SIZE];
if(argc!=2){
printf("Error. please enter ./a followed by the file name");
return -1;}
source= fopen(argv[1], "r");
if (source == NULL) { /*reading error msg*/
printf("Error. File not found.");
return 1;
}
int count= 0;
while (!feof (source)) {
if (fgets(buffer, sizeof (buffer), source)) {
if(count==0){ /*Convert string to int using atoi*/
char str[1];
sprintf(str, "%c%c%c%c%c%c%c%c%c",buffer[2],buffer[3],buffer[4],buffer[5],buffer[6],buffer[7],buffer[8],buffer[9],buffer[10]);/*get string of first */
size= atoi(str); /* read the size of file--> FIRST LINE of file*/
printf("SIZE: %d\n",size);
bigArr= malloc(size*sizeof(int));
}
else{
//printf("[%d]= %s\n",count-1, buffer); /*reads in the rest of the file*/
bigArr[count-1]= atoi(buffer);
}
count++;
}
}
/*thread the unsorted array*/
pthread_t tid[6]; /* the thread identifier */
pthread_attr_t attr; /* set of thread attributes */
// qsort((void*)bigArr, size, sizeof(bigArr[0]), _comp); <---- sorts array without threading
for(i=0; i<6;i++){
pthread_create(&tid[i], NULL, &threadFct, i);
pthread_join(tid[i], NULL);
}
printf("Sorted array:\n");
for(i=0; i<size;i++){
printf("%i \n",bigArr[i]);
}
fclose(source);
}
So to clarify the problem function is in my threadFct()
.
To explain what the function is doing, the param(thread number) identifies which chunk of the array to quicksort. I divide the size into 6 parts and because the it is even, the remainder of the numbers go into the last chunk. So for example, 1,000,000 integers I would have the first 5/6 sort 166,666 each and the last 1/6 would sort the remainder (166670).
I am aware that
- Multi-threading will not speed up much at all even for 10 million integers
- This is not the most efficient way to find the median/mode
Thanks for reading this and any help is received with gratitude.