My best shot at performance without getting overboard with special algos.
The Erathostenes' seive - the complexity of the below is O(N*log(log(N))) - because the inner j
loop starts from i*i
instead of i
.
#include <vector>
using std::vector;
void erathostenes_sieve(size_t upToN, vector<size_t>& primes) {
primes.clear();
vector<bool> bitset(upToN+1, true); // if the bitset[i] is true, the i is prime
bitset[0]=bitset[1]=0;
// if i is 2, will jump to 3, otherwise will jump on odd numbers only
for(size_t i=2; i<=upToN; i+=( (i&1) ? 2 : 1)) {
if(bitset[i]) { // i is prime
primes.push_back(i);
// it is enough to start the next cycle from i*i, because all the
// other primality tests below it are already performed:
// e.g:
// - i*(i-1) was surely marked non-prime when we considered multiples of 2
// - i*(i-2) was tested at (i-2) if (i-2) was prime or earlier (if non-prime)
for(size_t j=i*i; j<upToN; j+=i) {
bitset[j]=false; // all multiples of the prime with value of i
// are marked non-prime, using **addition only**
}
}
}
}
Now factoring based on the primes
(set in a sorted vector). Before this, let's examine the myth of sqrt
being expensive but a large bunch of multiplications is not.
First of all, let us note that sqrt is not that expensive anymore: on older CPU-es (x86/32b) it used to be twice as expensive as a division (and a modulo
operation is division), on newer architectures the CPU costs are equal. Since factorisation is all about %
operations again and again, one may still consider sqrt
now and then (e.g. if and when using it saves CPU time).
For example consider the following code for an N=65537
(which is the 6553-th prime) assuming the primes
has 10000 entries
size_t limit=std::sqrt(N);
size_t largestPrimeGoodForN=std::distance(
primes.begin(),
std::upper_limit(primes.begin(), primes.end(), limit) // binary search
);
// go descendingly from limit!!!
for(int i=largestPrimeGoodForN; i>=0; i--) {
// factorisation loop
}
We have:
- 1 sqrt (equal 1
modulo
),
- 1 search in
10000
entries - at max 14 steps, each involving 1 comparison, 1 right-shift division-by-2 and 1 increment/decrement - so let's say a cost equal with 14-20 multiplications (if ever)
- 1 difference because of
std::distance
.
So, maximal cost - 1 div and 20 muls? I'm generous.
On the other side:
for(int i=0; primes[i]*primes[i]<N; i++) {
// factorisation code
}
Looks much simpler, but as N=65537
is prime, we'll go through all the cycle up to i=64
(where we'll find the first prime which cause the cycle to break) - a total of 65 multiplications.
Try this with a a higher prime number and I guarantee you the cost of 1 sqrt+1binary search are better use of the CPU cycle than all the multiplications on the way in the simpler form of the cycle touted as a better performance solution
So, back to factorisation code:
#include <algorithm>
#include <math>
#include <unordered_map>
void factor(size_t N, std::unordered_map<size_t, size_t>& factorsWithMultiplicity) {
factorsWithMultiplicity.clear();
while( !(N & 1) ) { // while N is even, cheaper test than a '% 2'
factorsWithMultiplicity[2]++;
N = N >> 1; // div by 2 of an unsigned number, cheaper than the actual /2
}
// now that we know N is even, we start using the primes from the sieve
size_t limit=std::sqrt(N); // sqrt is no longer *that* expensive,
vector<size_t> primes;
// fill the primes up to the limit. Let's be generous, add 1 to it
erathostenes_sieve(limit+1, primes);
// we know that the largest prime worth checking is
// the last element of the primes.
for(
size_t largestPrimeIndexGoodForN=primes.size()-1;
largestPrimeIndexGoodForN<primes.size(); // size_t is unsigned, so after zero will underflow
// we'll handle the cycle index inside
) {
bool wasFactor=false;
size_t factorToTest=primes[largestPrimeIndexGoodForN];
while( !( N % factorToTest) ) {
wasFactor=true;// found one
factorsWithMultiplicity[factorToTest]++;
N /= factorToTest;
}
if(1==N) { // done
break;
}
if(wasFactor) { // time to resynchronize the index
limit=std::sqrt(N);
largestPrimeIndexGoodForN=std::distance(
primes.begin(),
std::upper_bound(primes.begin(), primes.end(), limit)
);
}
else { // no luck this time
largestPrimeIndexGoodForN--;
}
} // done the factoring cycle
if(N>1) { // N was prime to begin with
factorsWithMultiplicity[N]++;
}
}