I know there's a few posts about how to implement wheels, but I'm really struggling to see how to implement one with my current approach to the sieve.
I created my own bit array in C, with the following implementation:
#define setBit1(Array, i) (Array[i/INT_BITS] |= (1 << i % INT_BITS))
#define getBit(Array, i) ((Array[i/INT_BITS] & (1 << i % INT_BITS)) ? 1 : 0)
#define setBit0(Array, i) (Array[i/INT_BITS] &= ~(1 << i % INT_BITS))
int * createBitArray(unsigned long long size) {
// find bytes required, round down to nearest whole byte
unsigned long long bytesRequired = size / BITS_PERBYTE;
// round up to highest multiple of 4 or 8 bytes (one int)
bytesRequired = (sizeof(int) * (bytesRequired / sizeof(int) +
((size % BITS_PERBYTE * sizeof(int) == 0) ? 0 : 1)));
// allocate array of "bits", round number of ints required up
return (int *)malloc((bytesRequired));
}
I've done a few tests in C using the clock(), and I've found that for large array sizes of more than 1,000,000, the bit array, even with its bit manipulations, is at least 200% faster than an int array. It also uses 1/32 of the memory.
#define indexToNum(n) (2*n + 1)
#define numToIndex(n) ((n - 1) / 2)
typedef unsigned long long LONG;
// populates prime array through Sieve of Eratosthenes, taking custom
// odd keyed bit array, and the raw array length, as arguments
void findPrimes(int * primes, LONG arrLength) {
long sqrtArrLength = (long)((sqrt((2 * arrLength) + 1) - 1) / 2);
long maxMult = 0;
long integerFromIndex = 0;
for (int i = 1; i <= sqrtArrLength; i++) {
if (!getBit(primes, i)) {
integerFromIndex = indexToNum(i);
maxMult = (indexToNum(arrLength)) / integerFromIndex;
for (int j = integerFromIndex; j <= maxMult; j+= 2) {
setBit1(primes, numToIndex((integerFromIndex*j)));
}
}
}
}
I've been populating the bit array with the index, i, representing a number obtained through (2i + 1). This has the benefit of reducing any time spent iterating over even numbers, and once again reducing the necessary memory of the array by half. 2 is manually added to the primes after. This comes with a consequence of the time spent translating between index and number, but with my tests, for more than 1,000 primes, this approach is faster.
I'm stumped on how I could further optimize; I've reduced array size, I'm only testing to sqrt(n), I'm starting the "sieving" of primes upwards from p * p, I've eliminated all evens, and it's still taking me about 60 seconds for the first 100,000,000 primes in C.
As far as I'm aware, the "wheel" method requires that the actual integers of numbers be stored in the index. I'm really stuck on implementing it with my current bit array.