0

I am attempting to create a multi-threaded Sieve of Eratosthenes

The number of threads are set by default to 4, though they can be specified by the user as a command line argument.

I'm attempting to mark all intervals of the prime within the array concurrently with several different threads. So, the array containing 0 or 1 is split into arrayElements/threadNumber.

Each thread has a specified starting position and ending position of the array for it to check for intervals of the prime within.

So, for instance let's assume that the number you want to go up to is 20, and you have 4 threads. Thread 0 will start at 0 and go up to 20/4-1. The next will start at 20/4*threadNumber, and go up to (20/4*nextThreadNumber)-1 and so on.

I then have to check to see if the prime that was found is within the array area of one of the threads. This is because if it IS, that prime CANNOT be marked as nonprime. I was having a problem because the primes, after out of the bounds of the first thread, would be marked as nonprime because the prime number divides itself.

As you can see below, I find if the startingPosition is divisible by the prime. If it is, I set that as that thread's "nonPrime seeking" starting point, and increment by the prime from there, marking each instance within range as nonprime. If it isn't, then I calculate what the next non-prime is (based off the prime number) and put that as the start.

At the end of all this, it's your usual "Loop through the array in intervals of primeNumber, marking each instance as nonprime".

Long story short, I have to get this working with numbers up to the range of a 32 bit integer (Which is 2billion or so). It works fine with smaller numbers, but after doing some benchmarking tests at 1.4 million numbers it takes 13.4 seconds. For 5.4 million it takes 37.3 seconds. For 10 million it takes 68 seconds. For the 2billion, it just keeps working (I've let it run for 10 minutes or more) with no end in sight.

So, how can I improve my code? What is causing it to take so long? It doesn't seem to be any faster than a single threaded implementation (I set the thread argument to 1 for single threaded, and it takes 56 seconds for 10 million numbers)

So, here is the code, with

define maxNum 10483646

Threaded function:

   int numThreads; //number of threads
    int innerCounter;
    int composite[maxNum];


//need to find all prime numbers up to unsigned 32 bit integer
//creating n threads, (start to 1/n -1) 0,  (1/n to 2/n -1) , (2/n to 3/n -1) until it's (n-1/n to n/n) are starting positions for looking for primes so threads aren't accessing same area
void* markPrimes(int i){
    //Prime number should be innerCounter
    //printf("Threaded process: %d\n", i);
    //starting position in array: (maxNum/threadNum) * i
    //ending position in array: ((maxNum/threadNum)) * (i+1) - 1
    int startingPosition;
    int compositeCounter;
    int firstNonPrime;
    int endingPosition;
    int primeInRange;


        startingPosition = (double)(maxNum/numThreads) * i;
        endingPosition = (double)(maxNum/numThreads) * (i+1)-1;
        if(i == numThreads-1){
            endingPosition = maxNum;
        }

        if(startingPosition <= innerCounter && innerCounter <= endingPosition){ //the prime number is in range, and should be ignored
            primeInRange = 1;
        }

        firstNonPrime = startingPosition%innerCounter;
        if(firstNonPrime != 0){
            int temp = innerCounter - firstNonPrime;
            firstNonPrime = temp + startingPosition;
        }else{
            firstNonPrime = startingPosition;
        }
        if(primeInRange == 1){
            firstNonPrime = innerCounter + innerCounter;
        }

    if(firstNonPrime <= endingPosition){


        for(compositeCounter = firstNonPrime; compositeCounter <= endingPosition; compositeCounter += innerCounter){

                        composite[compositeCounter] = 1;

                    }
    }
    return (void*)0;
}

And now the main function which has the rest of the algorithm and creates the threads:

int main(int argc, char** argv[]){

    clock_t start; //start time
    clock_t stop; //end time
    double total_time;
    int rc;
    int nextNum;
    int prevNum = 0;
    int i;
    int numPrimes;
    //unsigned int maxNum = INT_MAX; //maximum unsigned integer value to go up until
    //bit array for threads to check primes for
    for(i = 0; i < maxNum+1; i++){
        composite[i] = 0;
    }
    if(argc > 1){
        numThreads = atoi(argv[1]); //argument given for n number of threads
    }else{
        numThreads = 4; //default if no argument given is 4 threads
    }
    pthread_t threads[numThreads]; //array of threads
    start = clock(); //start timing

    //Sieve algorithm here! When prime found, spawn threads!
    int outerCounter = 1;
    while(outerCounter < sqrt(maxNum)){
        //searching numbers above the current for prime numbers

        for(innerCounter = outerCounter+1; innerCounter <= maxNum; innerCounter++){
            //not composite

            if(composite[innerCounter] == 0){
                //setting all multiples of innerCounter to 1, creating threads to split up the work!

                for(i = 0; i < numThreads; i++){

                    rc = pthread_create(&threads[i], NULL, markPrimes, (void*) i);

                    //Detecting Error
                    if(rc){
                        //perror("Thread creation error!");
                        //exit(-1);
                    }
                }
                for(i = 1; i < numThreads; i++){
                    pthread_join(threads[i], NULL);
                }
                outerCounter = innerCounter;
                numPrimes++;

            }
        }
    }

    stop = clock(); //stop timing
    total_time = (double)(stop - start) / CLOCKS_PER_SEC;


    printf("Time for threads: %.5f\n", total_time);
    printf("Number of primes: %d\n", numPrimes-1);

    return 0;
}

I thank you all in advance for your patience and assistance!

Edit: I must use pthreads

Edit 2: I used How to sleep or pause a PThread in c on Linux as an example to try and guide some conditional locking and unlocking. Since I do essentially want to pause and unpause the marking of primes. I put the while statement (With the lock and unlock statements) in my threaded function as a group above where the start/stop sections are discovered. I put the marking of an int to 1 when a prime is found inside my inner if statement of the algorithm with lock/unlock statements, and set that variable to 0 just outside of that if statement with lock/unlock statements.

Is that what I'm supposed to be doing?

Community
  • 1
  • 1
FeralShadow
  • 259
  • 1
  • 4
  • 14
  • 2
    Creating a thread is not a fast operation, that is, in your fast algorithm you do not want to create 10000000 threads or else all the potential benefits of using threads will be lost. Not sure if that is your case, though... – rodrigo Mar 04 '13 at 20:00
  • 1
    You need to create your 4 threads once, before you start calculating anything. Creating 4 threads in the inner loop of 3 nested loops is not the way to parallelze an algorithm. Thread creating/destruction overhead is _very_ large compared to the work you do in each thread. – nos Mar 04 '13 at 20:16
  • Ok, I understand the reasoning. I'm trying to figure out where to spawn them though. I have to spawn them prior to the algorithm, but they can only run when I know what prime number I have to mark its intervals as non-prime for... And, they can't exit when all the non-primes of one prime are marked, they have to stick around ... until the full range has been tested for primes? – FeralShadow Mar 04 '13 at 20:24
  • `int composite[maxNum];` that's a **lot** of memory. Use one bit per number to reduce the memory pressure. – Daniel Fischer Mar 04 '13 at 20:56
  • In response to that, I'm actually going to implement a bitmap with an 8 bit integer array. Right now, I'm only trying to get my threading process improved. – FeralShadow Mar 04 '13 at 20:57
  • I would seriously consider a work-crew approach if this is really what you want to do. A single-threaded bull-headed sieve will beat the hell out of the numbers you currently have. My 2-year-old mackbook air with a non-optimized generic C-algorithm will find all primes below 400-million (all 21336327 of them) in about 10 seconds. Any reasonably hand-optimized algorithm will beat the hell out of *that*. Remember your Knuth =P – WhozCraig Mar 04 '13 at 21:20
  • @FeralShadow You can spawn the threads prior running your algorithm, you just need to wake them up and tell them to perform work, and signal back to main() that the current work is finished, which you can do with pthread condition variables. – nos Mar 04 '13 at 21:28
  • Could you perhaps explain a bit more? My professor has only explained how to spawn pthreads and has left the rest up to us. Though he does prefer the Sieve of Eratosthenes since we coded that in a previous assignment. As a note, I'm now discovering that at the 10 million range the primes returned changes slightly from one run to the next. What you're suggesting, nos, is to suspend the threads then run them whenever a new prime is found? That means I probably need some sort of conditional loop to ensure the threads don't exit early, yes? – FeralShadow Mar 04 '13 at 21:30
  • As another note, I must use pthreads. – FeralShadow Mar 04 '13 at 21:32

0 Answers0