1

I'm trying to find some primes with the Sieve of the greek guy algorithm. I have some efficiency concerns. Here's the code:

void check_if_prime(unsigned number)
{
    unsigned index = 0;
    while (primes[index] <= std::sqrt(number))
    {
        if (number % primes[index] == 0) return;
        ++index;
    }
    primes.push_back(number);
}

And, because I coded huge 2/3/5/7/11/13 prime wheel, the code is 5795 lines longs.

for (unsigned i = 0; i < selection; ++i)
{
    unsigned multiple = i * 30030;
    if (i!=0) check_if_prime( multiple+1 );
    check_if_prime ( multiple+17 );
    check_if_prime ( multiple+19 );
    check_if_prime ( multiple+23 );
    // ...so on until 30029
}

Optimization flags: -O3, -fexpensive-optimizations, -march=pentium2

25 million primes in 20 minutes with CPU stuck at 50% (no idea why, tried real time priority but it didn't change much). Size of output text file is 256MB (going to change to binary later on).

  • Compilation takes ages! Is it okay? How can I make it faster without compromising efficiency?
  • Is that if statement at the start of for loop OK? I've read if statements take the longest.
  • Anything else concerning the code, not the algorithm? Anything to make it faster? What statements are faster than others?
  • Would even a bigger wheel (up to 510510, not just 30030, hell a lot of lines) compile within a day?

I want to find all primes up to 2^32 and little optimizations would save some hours and electricity. Thank you in advance!

EDIT: not seeking for an algorithm, seeking for code improvement if there can be done any!

  • 6
    I'm going to call it Sieve Of [The Greek Guy](https://en.wikipedia.org/wiki/Eratosthenes) from now on. – tux3 Mar 05 '15 at 20:55
  • Sorry, he's got a really complicated name, both writing and pronunciation, and I'm not a native speaker. –  Mar 05 '15 at 20:57
  • 2
    possible duplicate of [Which is the fastest algorithm to find prime numbers?](http://stackoverflow.com/questions/453793/which-is-the-fastest-algorithm-to-find-prime-numbers) – Ken White Mar 05 '15 at 20:58
  • ... and he works in the local takeaway - no end to his talents – Ed Heal Mar 05 '15 at 20:58
  • 1
    *CPU stuck at 50%* Dual-core ? – High Performance Mark Mar 05 '15 at 20:59
  • 3
    Note: if you mean "Sieve of Eratosthenes", then your problem is that this is not what you implemented. – Gort the Robot Mar 05 '15 at 20:59
  • On Linux/Debian you might install the `bsdgames` package and use `/usr/games/primes` (which is free software, so you could study its source code) – Basile Starynkevitch Mar 05 '15 at 21:00
  • 1
    I wonder how you can get almost 6k lines with this algorihm – 463035818_is_not_an_ai Mar 05 '15 at 21:00
  • Define stuck? How do you know it's not making progress? – harold Mar 05 '15 at 21:00
  • yes, dual core. 50% and not moving. Should I learn threading? –  Mar 05 '15 at 21:02
  • well, now i start to understand the 6k lines thing... templates can be used in a nice way to let the compiler calculate prime numbers. I guess you could get the same results and even more compile time within less than 100 lines of code – 463035818_is_not_an_ai Mar 05 '15 at 21:02
  • btw you could consider asking a more specific question. "Is it ok?" is quite subjective. – 463035818_is_not_an_ai Mar 05 '15 at 21:04
  • 1
    *Should I learn threading?* Only if you think that multicore and parallel computers are more than a passing fad. – High Performance Mark Mar 05 '15 at 21:05
  • Well if there is a way of changing it into something a little bit faster it's not OK. –  Mar 05 '15 at 21:06
  • 5
    It looks like you're trying to speed up your algorithm using loop-unrolling. If that's the case, you probably don't need to bother; most optimizing compilers will do that automatically for you during compilation. In any case a 5795-line loop-unroll is several orders of magnitude too large. – Jeremy Friesner Mar 05 '15 at 21:17
  • 2
    Btw expecting a major speedup without changing the algorithm is like expecting your car to go faster by replacing only its tires; you might get a small speedup through compiler tricks, but the only way to get a big speedup is by using a more efficient algorithm. – Jeremy Friesner Mar 05 '15 at 21:19
  • if it takes that long to compile, you might consider that part of the runtime cost. :) – Martin Serrano Mar 05 '15 at 22:04
  • 1
    I'm voting to close this question as off-topic because this is a question about optimizing working code. Check http://codereview.stackexchange.com/ – Baum mit Augen Mar 06 '15 at 00:08
  • 1
    @BaummitAugen Questions containing stub code or hypothetical code would be off-topic on [codereview.se]. – nhgrif Mar 06 '15 at 00:10
  • @nhgrif Yes, but he says he already has working code. He should post that there. I do not say he should migrate the question as is. – Baum mit Augen Mar 06 '15 at 00:12

3 Answers3

3

Here is what I can say about the performance of your program:

  1. Likely your main problem is the call to std::sqrt(). This is a floating point function that's designed for full precision of the result, and it definitely take quite a few cycles. I bet you'll be much faster if you use this check instead:

    while (primes[index]*primes[index] < number)
    

    That way you are using an integer multiplication which is trivial for modern CPUs.

  2. The if statement at the start of your for() loop is irrelevant to performance. It's not executed nearly enough times. Your inner loop is the while loop within check_if_prime(). That's the one you need to optimize.

  3. I can't see how you are doing output. There are ways to do output that can severely slow you down, but I don't think that's the main issue (if it is an issue at all).

  4. Code size can be an issue: your CPU has an instruction cache with limited capacity. If your 6k lines don't fit into the first level instruction cache, the penalty can be severe. If I were you, I'd reimplement the wheel using data instead of code, i. e.:

    unsigned const wheel[] = {1, 17, 19, 23, ...};    //add all your 6k primes here
    for (unsigned i = 0; i < selection; ++i)
    {
        unsigned multiple = i * 30030;
        for(unsigned j = 0; j < sizeof(wheel)/sizeof(*wheel); j++) {
            check_if_prime(multiple + wheel[j]);
        }
    }
    
cmaster - reinstate monica
  • 38,891
  • 9
  • 62
  • 106
  • The *while* statement can be optimized further - square root can be calculated once outside the loop: *while (primes[index] <= numRoot)* – SomeWittyUsername Mar 05 '15 at 21:28
  • *"... add all your 6k primes here"*, love it. I have seen data this huge hard coded except for font bitmap data on an embedded system. – Thomas Matthews Mar 05 '15 at 21:31
  • @ThomasMatthews It can be easily generated with your favorite scripting language or with a helper function in the original code :) – SomeWittyUsername Mar 06 '15 at 10:24
  • @cmaster It's not a compile time constant. It's a square root of a parameter to the function. Anyway, if the compiler had optimized the root calculation to outside the loop, there wouldn't be a problem from the beginning – SomeWittyUsername Mar 06 '15 at 10:26
  • @icepack My fault: I was looking at the wrong loop :-( Yes, the `std::sqrt()` call can alternatively be done once up front. I don't think that it will have much effect which of the two approaches (`sqrt()` up front or multiplication) you use, though. `sqrt()` up front has more overhead for the small primes (too few loop iterations), while the multiplication has more overhead for the large ones (slightly more complex loop condition). I went for the multiplication because its overhead is really small (on the order of one to three cycles), and it is a general technique to avoid drawing roots. – cmaster - reinstate monica Mar 06 '15 at 21:18
0

Get it running under a debugger, and single-step it, instruction by instruction, and at each point understand what it is doing, and why.

This makes you walk in the shoes of the CPU, and you will see all the silliness that nutty programmer is making you do, and you will see what you could do better.

That's one way to make your code go as fast as possible.

Program size, by itself, only affects speed if you've got it so fast that caching becomes an issue.

Mike Dunlavey
  • 40,059
  • 14
  • 91
  • 135
0

Here's a stab at some techniques for checking if a number is a prime:

bool is_prime(unsigned int number) // negative numbers are not prime.
{
  // A data store for primes already calculated.
  static std::set<unsigned int> calculated_primes;

  // Simple checks first:
  // Primes must be >= 2.
  // Primes greater than 2 are odd.
  if (   (number < 2)
      || ((number > 2) && ((number & 1) == 0) )
  {
    return false;
  }

  // Initialize the set with a few prime numbers, if necessary.
  if (calculated_primes.empty())
  {
     static const unsigned int primes[] =
     { 2, 3, 5, 7, 13, 17, 19, 23, 29};
     static const unsigned int known_primes_quantity =
         sizeof(primes) / sizeof(primes[0]);
     calculated_primes.insert(&primes[0], &primes[known_primes_quantity]);
  }
  // Check if the number is a prime that is already calculated:
  if (calculated_primes.find(number) != calculated_primes.end())
  {
    return true;
  }

  // Find the smallest prime to the number:
  std::set<unsigned int>::iterator prime_iter =
      calculated_primes.lower_bound(number);
  // Use this value as the start for the sieve.
  unsigned int prime_candidate = *prime_iter;
  const unsigned int iteration_limit = number * number;
  while (prime_candidate < iteration_limit)
  {
     prime_candidate += 2;
     bool is_prime = true;
     for (prime_iter = calculated_primes.begin();
          prime_iter != calculated_primes.end();
          ++prime_iter)
     {
       if ((prime_candidate % (*prime_iter)) == 0)
       {
         is_prime = false;
         break;
       }
     }
     if (is_prime)
     {
       calculated_primes.insert(prime_candidate);
       if (prime_candidate == number)
       {
         return true;
       }
     }
  }
  return false;
}

Note: This is untested code but demonstrates some techniques for checking if a number is prime.

Thomas Matthews
  • 56,849
  • 17
  • 98
  • 154