Very interesting task you have!
I decided to implement for you from scratch quite advanced, large but fast (efficient) solution in pure C++ (it needs C++20 standard when compiling).
Following supplementary algorithms were used in my code - Sieve of Eratosthenes, Fermat Primality Test, Miller Rabin Primality Test, Trial Division (using primes), Binary Search.
Also external C++ library Boost Multiprecision was used to implement big integer arithmetics. You may use other libraries for that if you want, my code is made in such a way that any library can be used. Also CLang/GCC compilers have __int128
type, you can use it also instead of big-integers, if your task is only within 128-bit range.
My code supports any bit-size integers, but I thoroughly tested correctness and speed till 1024 bits (i.e. tested 8, 16, 32, 64, 128, 256, 512, 1024 bits).
All computations where possible I did using multi-core, if you have 8 cores (actually hardware-threads) then in all computations I start a pool of 8 threads and use them. But this multi-threaded approach is used only for integers bigger than 128 bits, this is a kind of heuristics enabled by default, but if needed you can force single threaded or multi-threaded algorithm for any bit-size, by setting extra parameter to main function PrevPrime()
.
My main algorithm (PrevPrime()
function) does following steps:
One time (single time per whole program run) I compute a table of primes smaller than 2^20, using Sieve of Eratosthenes. This table is enough for up to 1024-bit integers. If you use 2048 bits or bigger than more primes can be used.
When we're given N, I start going from N downwards and again using Sieve of Eratosthenes to filter out composite numbers from range (N - length, N]. Not all 2^20 primes are used, experimentally I computed which amount of primes is optimal for each input N bit-size (for 32-bit 100 primes is enough, for 64-bit 500 primes, for 128-bit - 1200 primes, for 256-bit - 7000 primes, for 512-bit - 17500 primes, for 1024-bit - 75000 primes).
Sieve of Eratosthenes needs to start sieving for each specific prime p
from some offset N - off
where off = N % p
. Because division is most expensive out of elementary operations in CPU, I did this computation of offsets using multi-threaded (multi-core) approach (for bigger integers, for smaller single thread is used).
After sieving is done we know for sure which numbers in range (N - length, N] are composite, but we're unsure about the rest if they are primes or composite. Hence last step is running Miller-Rabin's Primality Test for each remaining numbers. These tests are also done using multi-threaded approach (for bigger integers only).
Timings of PrevPrime() are following on my old and slow 1.2 GHz
CPU which has 8 hardware threads, different timings for different bit sizes of integers:
32-bit - less than 0.00005 sec
per one PrevPrime() integer, 64-bit - 0.0005 sec
, 128-bit - 0.002
sec, 256-bit - 0.012
sec, 512-bit - 0.07
-0.1
sec, 1024-bit - 0.55
-1.3
sec.
So you can see that on my slow 1.2 GHz CPU 1024-bit integer outputs previous prime within just 1 second.
Timings above depend not only bit sizes of integers, but also on density of prime numbers. It is well known that for numbers of bit size B
prime numbers appear on average once in ln(2^B)
. Hence bigger numbers not only have more difficult mathematics of big integers, but also have more rare prime numbers, hence bigger numbers need more numbers to be checked before outputting previous prime number.
To see how efficient is sieving in PrevPrime(), I provide timings for Fermat and Miller-Rabin tests themselves (F
below is Fermat, MR
is Miller-Rabin):
128-bit - composite MR & F - 0.0002
sec , prime MR - 0.0026
sec F - 0.0054
sec ; 256-bit - composite MR & F - 0.0007
sec , prime MR - 0.0116
sec F - 0.0234
sec ; 512-bit - composite MR & F - 0.0043
sec , prime MR - 0.0773
sec F - 0.1558
sec ; 1024-bit - composite MR & F - 0.0290
sec , prime MR - 0.4518
sec F - 0.9085
sec
From statistics above we can conclude that Fermat test takes twice more time than Miller-Rabin when targeting same probability of failure.
As you can see single Miller-Rabin for prime is about 0.5 second for 1024-bit, and whole PrevPrime() runs within 1 second, it means PrevPrime() is so efficient that it is just twice slower than single testing of prime number using Miller-Rabin.
Of course as you could notice I used probablistic primality tests, because 1024-bit number deterministic test takes really-really bigger time.
For probabilistic tests I've chosen target probability of failure equal to 1/2^32 = 1/4294967296, so it is very very rare that primality test will fail. You may choose smaller probability, it is tweakable in my program, if you need more precision.
From theory, each single test of Fermat gives 1/2 chance of failure (excluding Carmichael Numbers that have almost 100% chance of failure), while Miller-Rabin gives 1/4 chance of failure. Hence to reach target probability 1/2^32 one needs 32 steps of Fermat and 16 steps of Miller-Rabin.
Threshold of amount of primes needed for each bit-size is automatically tweakable, but needs a lot of single-time pre-computation, that's why I computed table and hard-coded it, but code that calculates table still is there in my code. If you need to re-compute thresholds then go to function ComputePrevPrimeOptimalPrimesCnt()
and comment out first line (with return bits <= 32 ? 128 : .....
) and run program, it will output sizes optimal for your PC and bit-sizes. But my pre-calculated table should be good for any CPU, only new bit-sizes need to be tweaked.
If you don't know the only library that I depend on is Boost. It can be easily installed under Linux through sudo apt install libboost-dev-all
. Under Windows a bit more difficult, but still can be installed through Chocolatey's Boost Package (do choco install boost-msvc-14.3
, but install Chocolatey first).
By default when program is run, it runs all Tests, tests are all functions in code that start with macro TEST(TestName) { .....
. If you need extra tests, just write a body like TEST(NewTest) { ...code here... }
, no extra registration of test is needed. If you're sure you can also remove all TEST() functions when using my library, they are not needed for functioning, except that they useful for quite thoroughly testing my library.
Important Notice!!!. This code I wrote within 1 single day, so it is not very well tested and may contain some bugs somewhere. Hence straight away using it in production is not suggested without extra attentional looking through code and/or extra testing. It is more like educational code, that can be a kind of guidance for you writing your own production-ready code.
Code goes below.
Try it online!
(Note, Try it online!
online run is limited to 128 bits (out of 1024 possible), due to GodBolt server limiting program total run-time to 3 seconds, change test_till_bits = 128
to 1024
after downloading).
SOURCE CODE IS HERE. Hosting on Github Gist Source Code. Unfortunatelly due to StackOverflow post limit of 30K symbols, I can't inline full source code here, because only my source code has 30K bytes, not counting long English description above (which is extra 10K). Hence I'm sharing source code on Github GIST (link above), and GodBolt server (Try it online!
link above), both links have full source code.
Code console output:
'Test GenPrimes' 0.006 sec
'Test GenPrimes_CntPrimes' 0.047 sec
'Test MillerRabin' 0.006 sec
Fermat_vs_MillerRabin: bits 8 prp_cnt 58 mr_time 0.0000|0.0000 f_time 0.0000|0.0000, total time 0.009 sec
Fermat_vs_MillerRabin: bits 16 prp_cnt 57 mr_time 0.0000|0.0000 f_time 0.0000|0.0000, total time 0.001 sec
Fermat_vs_MillerRabin: bits 32 prp_cnt 46 mr_time 0.0000|0.0000 f_time 0.0000|0.0000, total time 0.004 sec
Fermat_vs_MillerRabin: bits 64 prp_cnt 22 mr_time 0.0000|0.0005 f_time 0.0000|0.0011, total time 0.041 sec
Fermat_vs_MillerRabin: bits 128 prp_cnt 13 mr_time 0.0002|0.0028 f_time 0.0002|0.0056, total time 0.134 sec
Fermat_vs_MillerRabin: bits 256 prp_cnt 6 mr_time 0.0008|0.0120 f_time 0.0008|0.0245, total time 0.321 sec
Fermat_vs_MillerRabin: bits 512 prp_cnt 1 mr_time 0.0042|0.0618 f_time 0.0042|0.1242, total time 0.857 sec
Fermat_vs_MillerRabin: bits 1024 prp_cnt 1 mr_time 0.0273|0.4101 f_time 0.0273|0.8232, total time 3.807 sec
'Test IsProbablyPrime_Fermat_vs_MillerRabin' 5.176 sec
'Test PrevPrime_ReCheckWith_PrimesDiv_and_Fermat' 5.425 sec
PrevPrime 8-bit threads:1 0.0000 sec, avg distance 3.3, total time 0.001 sec
PrevPrime 8-bit threads:8 0.0000 sec, avg distance 3.3, total time 0.000 sec
PrevPrime 16-bit threads:1 0.0000 sec, avg distance 11.2, total time 0.000 sec
PrevPrime 16-bit threads:8 0.0000 sec, avg distance 11.2, total time 0.000 sec
PrevPrime 32-bit threads:1 0.0001 sec, avg distance 10.5, total time 0.001 sec
PrevPrime 32-bit threads:8 0.0001 sec, avg distance 10.5, total time 0.001 sec
PrevPrime 64-bit threads:1 0.0014 sec, avg distance 39.7, total time 0.025 sec
PrevPrime 64-bit threads:8 0.0012 sec, avg distance 39.7, total time 0.023 sec
PrevPrime 128-bit threads:1 0.0110 sec, avg distance 85.2, total time 0.170 sec
PrevPrime 128-bit threads:8 0.0084 sec, avg distance 85.2, total time 0.142 sec
PrevPrime 256-bit threads:1 0.0452 sec, avg distance 207.4, total time 0.570 sec
PrevPrime 256-bit threads:8 0.0331 sec, avg distance 207.4, total time 0.473 sec
PrevPrime 512-bit threads:1 0.1748 sec, avg distance 154.0, total time 2.246 sec
PrevPrime 512-bit threads:8 0.1429 sec, avg distance 154.0, total time 2.027 sec
PrevPrime 1024-bit threads:1 2.1249 sec, avg distance 379.4, total time 15.401 sec
PrevPrime 1024-bit threads:8 1.6030 sec, avg distance 379.4, total time 12.778 sec
'Test PrevPrime' 33.862 sec
'Program run time' 44.524 sec