I am attempting to create a multi-threaded Sieve of Eratosthenes
The number of threads are set by default to 4, though they can be specified by the user as a command line argument.
I'm attempting to mark all intervals of the prime within the array concurrently with several different threads. So, the array containing 0 or 1 is split into arrayElements/threadNumber.
Each thread has a specified starting position and ending position of the array for it to check for intervals of the prime within.
So, for instance let's assume that the number you want to go up to is 20, and you have 4 threads. Thread 0 will start at 0 and go up to 20/4-1. The next will start at 20/4*threadNumber, and go up to (20/4*nextThreadNumber)-1 and so on.
I then have to check to see if the prime that was found is within the array area of one of the threads. This is because if it IS, that prime CANNOT be marked as nonprime. I was having a problem because the primes, after out of the bounds of the first thread, would be marked as nonprime because the prime number divides itself.
As you can see below, I find if the startingPosition is divisible by the prime. If it is, I set that as that thread's "nonPrime seeking" starting point, and increment by the prime from there, marking each instance within range as nonprime. If it isn't, then I calculate what the next non-prime is (based off the prime number) and put that as the start.
At the end of all this, it's your usual "Loop through the array in intervals of primeNumber, marking each instance as nonprime".
Long story short, I have to get this working with numbers up to the range of a 32 bit integer (Which is 2billion or so). It works fine with smaller numbers, but after doing some benchmarking tests at 1.4 million numbers it takes 13.4 seconds. For 5.4 million it takes 37.3 seconds. For 10 million it takes 68 seconds. For the 2billion, it just keeps working (I've let it run for 10 minutes or more) with no end in sight.
So, how can I improve my code? What is causing it to take so long? It doesn't seem to be any faster than a single threaded implementation (I set the thread argument to 1 for single threaded, and it takes 56 seconds for 10 million numbers)
So, here is the code, with
define maxNum 10483646
Threaded function:
int numThreads; //number of threads
int innerCounter;
int composite[maxNum];
//need to find all prime numbers up to unsigned 32 bit integer
//creating n threads, (start to 1/n -1) 0, (1/n to 2/n -1) , (2/n to 3/n -1) until it's (n-1/n to n/n) are starting positions for looking for primes so threads aren't accessing same area
void* markPrimes(int i){
//Prime number should be innerCounter
//printf("Threaded process: %d\n", i);
//starting position in array: (maxNum/threadNum) * i
//ending position in array: ((maxNum/threadNum)) * (i+1) - 1
int startingPosition;
int compositeCounter;
int firstNonPrime;
int endingPosition;
int primeInRange;
startingPosition = (double)(maxNum/numThreads) * i;
endingPosition = (double)(maxNum/numThreads) * (i+1)-1;
if(i == numThreads-1){
endingPosition = maxNum;
}
if(startingPosition <= innerCounter && innerCounter <= endingPosition){ //the prime number is in range, and should be ignored
primeInRange = 1;
}
firstNonPrime = startingPosition%innerCounter;
if(firstNonPrime != 0){
int temp = innerCounter - firstNonPrime;
firstNonPrime = temp + startingPosition;
}else{
firstNonPrime = startingPosition;
}
if(primeInRange == 1){
firstNonPrime = innerCounter + innerCounter;
}
if(firstNonPrime <= endingPosition){
for(compositeCounter = firstNonPrime; compositeCounter <= endingPosition; compositeCounter += innerCounter){
composite[compositeCounter] = 1;
}
}
return (void*)0;
}
And now the main function which has the rest of the algorithm and creates the threads:
int main(int argc, char** argv[]){
clock_t start; //start time
clock_t stop; //end time
double total_time;
int rc;
int nextNum;
int prevNum = 0;
int i;
int numPrimes;
//unsigned int maxNum = INT_MAX; //maximum unsigned integer value to go up until
//bit array for threads to check primes for
for(i = 0; i < maxNum+1; i++){
composite[i] = 0;
}
if(argc > 1){
numThreads = atoi(argv[1]); //argument given for n number of threads
}else{
numThreads = 4; //default if no argument given is 4 threads
}
pthread_t threads[numThreads]; //array of threads
start = clock(); //start timing
//Sieve algorithm here! When prime found, spawn threads!
int outerCounter = 1;
while(outerCounter < sqrt(maxNum)){
//searching numbers above the current for prime numbers
for(innerCounter = outerCounter+1; innerCounter <= maxNum; innerCounter++){
//not composite
if(composite[innerCounter] == 0){
//setting all multiples of innerCounter to 1, creating threads to split up the work!
for(i = 0; i < numThreads; i++){
rc = pthread_create(&threads[i], NULL, markPrimes, (void*) i);
//Detecting Error
if(rc){
//perror("Thread creation error!");
//exit(-1);
}
}
for(i = 1; i < numThreads; i++){
pthread_join(threads[i], NULL);
}
outerCounter = innerCounter;
numPrimes++;
}
}
}
stop = clock(); //stop timing
total_time = (double)(stop - start) / CLOCKS_PER_SEC;
printf("Time for threads: %.5f\n", total_time);
printf("Number of primes: %d\n", numPrimes-1);
return 0;
}
I thank you all in advance for your patience and assistance!
Edit: I must use pthreads
Edit 2: I used How to sleep or pause a PThread in c on Linux as an example to try and guide some conditional locking and unlocking. Since I do essentially want to pause and unpause the marking of primes. I put the while statement (With the lock and unlock statements) in my threaded function as a group above where the start/stop sections are discovered. I put the marking of an int to 1 when a prime is found inside my inner if statement of the algorithm with lock/unlock statements, and set that variable to 0 just outside of that if statement with lock/unlock statements.
Is that what I'm supposed to be doing?