2

I'm trying to figure this problem http://www.urionlinejudge.com.br/judge/problems/view/1032, where I need to find the prime numbers between 1 to 3501 the fastest way, since the time limit may not exceed 1 second.

The way I'm calculating these prime numbers is to check if they are prime until their square root, then eliminating the multiple of the first prime numbers [2, 3, 5, 7] to improve the performance of the algorithm. Yet, the time exceeds.

My code (takes 1.560s as the internal testing of the submission system)

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <time.h>
#include <iostream>
#include <set>

using namespace std;

set<int> circ;
set<int> primes;

/* Calculate List Primes */ 
void n_prime(int qtd){
    int a, flag=1, l_prime = 1;
    float n;
    for(int i=0;i<qtd;i++){
        switch (l_prime){
            case 1:
                l_prime = 2;
                break;
            case 2:
                l_prime = 3;
                break;
            default:
                while(1){
                    flag=1;
                    l_prime+=2;
                    if(l_prime>7)
                    while(l_prime%2==0||l_prime%3==0||l_prime%5==0||l_prime%7==0) l_prime++;
                    n=sqrt(l_prime);
                    for(a=2;a<=n;a++){
                        if(l_prime%a==0){
                            flag=0;
                            break;
                        }
                    }
                    if(flag) break;
                }
        }
        primes.insert(l_prime);
    }
}

/* Initialize set with live's */ 
void init_circ(int t){
    circ.clear();
    for(int a=0;a<t;a++){
        circ.insert(a+1);
    }
}

/* Show last live*/
void show_last(){
    for(set<int>::iterator it=circ.begin(); it!=circ.end(); ++it)
        cout << *it << "\n";
}

int main(){
    int n = 0;
    clock_t c1, c2;
    c1 = clock();
    n_prime(3501);
    while(scanf("%d", &n)&&n!=0){
        init_circ(n);
        int idx=0, l_prime,count = 0;
        set<int>::iterator it;
        set<int>::iterator np;
        np=primes.begin();
        for(int i=0;i<n-1;i++){
            l_prime=*np;
            *np++;
            idx = (idx+l_prime-1) % circ.size();
            it = circ.begin();
            advance(it, idx);
            circ.erase(it);
        }
        show_last();
    }
    c2 = clock();
    printf("\n\nTime: %.3lf", (double)(c2-c1)/CLOCKS_PER_SEC);
    return 0;
}
Jean Marcos
  • 167
  • 6
  • 2
    Just use the [sieve of eratosthenes](http://stackoverflow.com/questions/7921456/sieve-of-eratosthenes) – aaronman Aug 29 '13 at 19:00
  • 1
    Also consider the [Sieve of Atkin](http://en.wikipedia.org/wiki/Sieve_of_Atkin), which is an optimized version of the Sieve of Eratosthenes. But for your small number range, it's quite likely the speed difference is only trivial. – Cornstalks Aug 29 '13 at 19:35
  • BTW if you like problems like this you should really try [project euler](http://projecteuler.net) – aaronman Aug 29 '13 at 19:58

6 Answers6

4

The easiest way is the sieve of Eratosthenes, here's my implementation:

//return the seive of erstothenes
std::vector<int> generate_seive(const ulong & max) {
    std::vector<int> seive{2};
    std::vector<bool> not_prime(max+1);
    ulong current_prime = seive.back();
    bool done = false;
    while(!done){ 
        for (int i = 2; i * current_prime <= max;i++) {
            not_prime[i*current_prime]=true;
        }
        for (int j = current_prime+1;true;j++) {
            if (not_prime[j] == false) {
                if( j >= max) {
                    done = true;
                    break;
                }
                else {
                    seive.push_back(j);
                    current_prime = j;
                    break;
                }
            }
        }
    }
    return seive;
}

Generates all the prime numbers under max, BTW these are the times for my sieve and 3501 as the max number.

real    0m0.008s
user    0m0.002s
sys     0m0.002s
aaronman
  • 18,343
  • 7
  • 63
  • 78
  • By the way, you can optimize your code even further if you change the starting value for i to current_prime. The key observation for this change is that you are trying to compose factors of the type `i * current_prime` where `i < current_prime`, so i will have a prime factor p, which is less than current_prime, and thus the number `i * current_prime` will be marked as composite on the mark iteration for the prime p (which we had found earlier). However, you should be aware for an int overflow at the expression `i * current_prime <= max`. – yasen Aug 30 '13 at 08:46
2

There is a better algorithm for finding primes. Have you heard about Eratosthenes and his sieve?

Also, you are using tons of STL (i.e. the set<>) as well as remainder operations in your code. This is simply killing the speed of your program.

yasen
  • 1,250
  • 7
  • 13
  • 2
    There is nothing wrong with the STL, in most cases using it will greatly improve your code – aaronman Aug 29 '13 at 19:07
  • what do you mean by "module operations"? – David Brown Aug 29 '13 at 19:16
  • Do you know what [beautiful] structure lies behind set<>? The problem here is that it is completely irrelevant -- you can use a simple array. And it is a whole other topic whether STL improves your code. Certainly it helps you a lot. However, if you want the fastest solution, it is better to implement everything yourself. – yasen Aug 29 '13 at 19:18
  • @DavidBrown I meant remainder (or modulus) operations. Thanks, I'll correct it. – yasen Aug 29 '13 at 19:20
  • 2
    @yasen I wasn't arguing that you should use a set, but stl data structures generally do not slow down your code by any significant amount, any c++ expert will tell you to always prefer something from the standard library over a c style array – aaronman Aug 29 '13 at 19:24
  • @yasen: note that using the standard library allows the compiler to use certain intrinsics that it otherwise might not be able to if you "implement everything yourself." – Cornstalks Aug 29 '13 at 19:25
  • @yasen for an example use c-style arrays with my implementation of the sieve and see if it runs faster than mine – aaronman Aug 29 '13 at 19:26
  • @aaronman It depends on the input. For 3500 there won't be any *huge* difference. – yasen Aug 29 '13 at 19:31
  • @yasen well obviously, if your feeling bold write it and test it for 10000000 – aaronman Aug 29 '13 at 19:33
  • @aaronman For 10M your code is around 1.5 times faster than mine (300 ms vs. 470 ms). For 100M my code is around 1 sec faster than yours (~6 s vs. ~5 s). However, if I add the optimization I mentioned (as a comment on your accepted answer) and change the type if `i` to `long long`, then your code is around 0.5 s faster than mine :) So STL isn't that bad after all. – yasen Aug 30 '13 at 09:00
  • @yasen I can't say I'm surprised, ulong is a typedef BTW, I forgot to take it out when I posted it – aaronman Aug 30 '13 at 16:44
1

Some basic advice, then a basic (albeit untested) answer...

The advice: If you have one resource that's very limited, take advantage of other resources. In this case, since time is limited, take up a lot of space. Don't dynamically allocate any memory, make it all fixed length arrays.

The way I would do it is simply to have one boolean array and apply Aristophanes' sieve to it:

void findPrimes(int cap) { // set to void for now, since I don't know your return type
    bool[] possibilities = new bool[cap + 1]; // has to be dynamic so that you can scale for any top
    possibilities[0] = false;
    possibilities[1] = false;
    int found = 0;

    for (int i = 2; i < cap; ) {
        ++found;
        for (int j = i + i; j < cap; j += i) {
            possibilities[j] = false;
        }
        do {
            ++i;
        } while (!possibilities[i]);
    }

    // at this point, found says how many you've found, and everything in possibilities that is true is a factor.  Just return it however you need.

I see aaronman beat me to the punch with the sieve idea. Mine is a slightly different implementation (more exactly resembles the original sieve, using only addition), and uses less dynamic memory, so I'm still submitting it.

Scott Mermelstein
  • 15,174
  • 4
  • 48
  • 76
  • Anyone want to tell me why the -1? – Scott Mermelstein Aug 29 '13 at 19:40
  • Wasn't me but you did spell Eratosthenes completely wrong – aaronman Aug 29 '13 at 20:05
  • @aaronman Blame Google! :-> If you type in "sieve of ar", it will automatically complete it to "aristophanes" and link to the "sieve of eratosthenes" wiki page. I just assumed it was an alternate spelling, though apparently it's just a sufficiently common misspelling. (For what it's worth, I thought I was looking up the Sieve of Archimedes, and it at least corrected me that far...) – Scott Mermelstein Aug 29 '13 at 20:25
  • Lol, i highly doubt that was it anyway – aaronman Aug 29 '13 at 21:10
0

You can get quite a nice speedup by preexcluding even numbers, just change your increment to i += 2 and make sure you don't omit 2 in your result array. You might even think about trying to preexclude multiples of 3, but that starts getting dirty. Something along the lines:

for(long i = 1; i < qtd; i += 6) {
    //check if i is prime
    //check if i+4 is prime
}

This should be enough to get you below the limit.

cmaster - reinstate monica
  • 38,891
  • 9
  • 62
  • 106
0

The sieve of Eratosthenes is the right way to do it, as others have suggested. But the implementations I saw here were very complicated. The sieve is extremely easy to implement. E.g.:

std::vector<int> get_primes(uint max) {
  std::vector<int> primes;
  std::vector<bool> is_composite(max+1,false);
  for (uint i = 2; i <= max; ++i) {
    if (!is_composite[i]) {
      primes.push_back(i);
      for (uint j = i+i; j <= max; j += i) {
        is_composite[j] = true;
      }
    }
  }
  return primes;
}
JoshG79
  • 1,685
  • 11
  • 14
0

There are two big, technical performance drains in your code:

  1. You insert your primes into a vector. Whenever the vector exceeds its allocated size, it will get a new, larger buffer, copy everything over, and delete the old one. new is very expensive in terms of performance, even more expensive than the copying involved. You can avoid this by telling the vector to reserve() enough space.

  2. You use sqrt() in your inner loop. This too is a slow function, square the prime instead (takes only one cycle on modern hardware), it will be faster.

cmaster - reinstate monica
  • 38,891
  • 9
  • 62
  • 106
  • about 1: reallocation and copying happens so few times, that it is totally irrelevant for the final performance (like, milliseconds). Using `reserve()` can still be justified, just saying it will not really help with the slowness here. – hyde Aug 29 '13 at 19:40