The PRIME1 problem at SPOJ (Sphere Online Judges) is designed so that you cannot simply sieve up to the upper limit, because in that case you will get hit by the timeout.
One possible approach is superior speed; by adding a few bells and whistles to the standard sieve it can be made to run fast enough to stay well below the timeout limit. Speed optimisations include:
- representing only the odd integers in the sieve (50% space savings)
- sieving in small, cache-friendly segments that fit into the L1 cache (32 KByte)
- presieving by small primes (i.e. blasting a precomputed pattern over the sieve segment)
- remembering last (or next) working offset for each prime across segments, instead of recomputing them using slow divisions
Putting all this together cuts the time for sieving the full 2^32 range from something like 20 seconds down to 2 seconds, well below the SPOI timeout. My pastebin has three simple C++ demo programs where you can inspect each of these optimisations in action and see their effect.
A much simpler approach is to do only the work that is necessary: sieve up to the square root of the last number of the target range to get all potential prime factors, and then sieve only the target range itself. That way you can solve to SPOJ problem in less than two dozen lines of code and a few milliseconds runtime. I just finished a demo .cpp for this type of segmented sieving (the difficult part was not the sieve but the test frame for comfortable testing, and the verification of proper operation up to 2^64-1 since there is hardly any reference data).
The sieve itself looks like this; the sieve is an odds-only packed bitmap, and the sieve range is specified in bits for robustness (it's all explained in the .cpp), so you would pass (range_start / 2) for offset
:
unsigned char odd_composites32[UINT32_MAX / (2 * CHAR_BIT) + 1]; // the small factor sieve
uintxx_t sieved_bits = 0; // how far it's been initialised
void extend_factor_sieve_to_cover (uintxx_t max_factor_bit); // bit, not number!
void sieve32 (unsigned char *target_segment, uint64_t offset, uintxx_t bit_count)
{
assert( bit_count > 0 && bit_count <= UINT32_MAX / 2 + 1 );
uintxx_t max_bit = bit_count - 1;
uint64_t max_num = 2 * (offset + max_bit) + 1;
uintxx_t max_factor_bit = (max_factor32(max_num) - 1) / 2;
if (target_segment != odd_composites32)
{
extend_factor_sieve_to_cover(max_factor_bit);
}
std::memset(target_segment, 0, std::size_t((max_bit + CHAR_BIT) / CHAR_BIT));
for (uintxx_t i = 3u >> 1; i <= max_factor_bit; ++i)
{
if (bit(odd_composites32, i)) continue;
uintxx_t n = (i << 1) + 1; // the actual prime represented by bit i (< 2^32)
uintxx_t stride = n; // == (n * 2) / 2
uint64_t start = (uint64_t(n) * n) >> 1;
uintxx_t k;
if (start >= offset)
{
k = uintxx_t(start - offset);
}
else // start < offset
{
uintxx_t before_the_segment = (offset - start) % stride;
k = before_the_segment == 0 ? 0 : stride - before_the_segment;
}
while (k <= max_bit)
{
set_bit(target_segment, k);
// k can wrap since strides go up to almost 2^32
if ((k += stride) < stride)
{
break;
}
}
}
}
For the SPOJ problem - numbers less than 2^32 - unsigned integers are sufficient for all variables (i.e. uint32_t instead of uintxx_t and uint64_t) and some things could be simplified further. Also, you can use sqrt()
instead of max_factor32()
for these small ranges.
In the demo code, extend_factor_sieve_to_cover()
does the moral equivalent of sieve32(odd_composites32, 0, max_factor_bit + 1)
in small, cache-friendly steps. For the SPOJ problem you can simply use the single sieve32() call since there are only 6541 small odd prime factors in numbers less than 2^32, which you can sieve in no time flat.
Hence the trick to solving this SPOJ problem is using segmented sieving, which cuts total runtime to a few milliseconds.