0

Suppose I want to allocate an array of integers to store all the prime numbers less than some N. I would then need an estimate for the array size, E(N). There is mathematical function that gives the exact number of primes below N, it's the Prime-counting function - pi(n). However, it looks impossible to define the function in terms of elementary functions.

There exist some approximations to the function, but all of them are asymptotic approximations, so they can be either above or below the true number of primes and cannot in general be used as the estimate E(N).

I've tried to use tabulated values of pi(n) for certain n like power-of-two and interpolate between them. However I noticed that the function pi(n) is convex, so the interpolation between sparse table points may accidentally yield values of E(n) below true pi(n) that may result in buffer overflow.

I then decided to exploit the monotonic nature of pi(n) and use the table values of pi(2^(n+1)) as an far upper estimate for E(2^n) an interpolate between them this time.

I still feel not completely sure that for some 2^n < X < 2^(n+1) an interpolation between pi(2^(n+1)) and pi(2^(n+2)) would be the safe upper estimate. Is it correct? How do I prove it?

mbaitoff
  • 8,831
  • 4
  • 24
  • 32

3 Answers3

2

You are overthinking this. In C, you just use malloc and realloc. I'd 100 times prefer an algorithm that just obviously works instead of one that requires a deep mathematical proof.

gnasher729
  • 51,477
  • 5
  • 75
  • 98
  • Inside an inner loop of an algorithm dealing with prime numbers and factorization this would introduce an additional conditional branch to test the array overflow which may cost performance. – mbaitoff Feb 23 '16 at 10:46
  • @mbaitoff It seems highly unlikely that this has a significance performance impact. You will just need two ounter variables (one for the number of elements, one for the log2 of the array size) and one bit check per iteration. – Niklas B. Feb 23 '16 at 10:50
1

Use an upper bound. There are a number to choose from, each more complicated but tighter. I call this prime_count_upper(n) since you want a value guaranteed to be greater than or equal to the number of primes under n. See Chebyshev, Rosser and Schoenfeld, Dusart 1999, Dusart 2010, Axler 2014, and Büthe 2015. R&S is simple and not terrible: π(x) <= x/(log(x)-3/2) for x >= 67 but Dusart gives better ones for larger values. Either way, no tables or original research needed.

DanaJ
  • 724
  • 6
  • 9
  • There's also a pretty `PrimePi(n) <= (30*log(113)/113)*n/log(n) for all n>1` in comments to https://oeis.org/A209883 – mbaitoff Feb 24 '16 at 16:18
  • 1
    That's what I call the Chebyshev bound, and it's over by a huge 20% at 10^12 (vs. 1.8% for the simple one I showed). It similarly overestimates at 10^6. The constant can be reduced as x increases (e.g. 1.1056 for x >= 96098 or 1.095 for x >= 284860), but it's still a very loose bound. Panaitopol 1999 improved the R&S result to `x/(log(x)-1.112)` for x >= 4 and the error at 10^12 drops to 0.27%. Dusart 1999 is another order of magnitude, Dusart 2010 yet another (albeit for large values). – DanaJ Feb 24 '16 at 19:29
1

The prime number theorem guarantees the nth prime P(n) is on the range n log n < P(n) < n log n + n log log n for n > 5. As DanaJ suggests, tighter bounds can be computed.

If you want to store the primes in an array, you can't be talking about anything too big. As you suggest, there is no direct computation of pi(n) in terms of elementary arithmetic functions, but there are several methods for computing pi(n) exactly that aren't too hard, as long as n isn't too big. See this, for instance.

Community
  • 1
  • 1
user448810
  • 17,381
  • 4
  • 34
  • 59
  • Your answer is about the magnitude if the N'th prime, not about number of primes below N? Or you mean to compute the number of primes from the estimation of prime magnitude? – mbaitoff Feb 24 '16 at 04:17
  • 1
    The two are dual to each other; thus, P(25) = 97 and pi(97) = 25. – user448810 Feb 24 '16 at 14:45
  • I think a relatively simple upper bound formula is more appropriate, but +1 for noting that exact prime counts are much faster than most people are aware of. Without a bit of optimization for phi(x,a) it could get bogged down at large values, and seems complicated to explain or debug. It really depends on the author / use. I worry about the sheer number of examples of broken SoEs that can be found on the web. If people can't write 4 lines of code that are clearly written on Wikipedia, I don't have high hopes of them getting Legendre or Meissel correct. – DanaJ Feb 25 '16 at 21:22
  • @DanaJ: Here's one way to teach phi(_x_, _a_). First, compute a bunch of examples by sieving, and make a table of the results, say _x_ on 1..100 and _a_ on 1..10. Then, by looking at the results of the first step, observe that phi(_x_, _a_) = phi(_x_, _a_-1) - phi(_x_ / P(_a_), _a_-1) without getting too bogged down in explanations. Once that is done, give Legendre's formula for pi(_x_). At every stage stay away from complicated theory and just get people to write the program. Disclaimer: I have no experience of actually doing this. – user448810 Feb 25 '16 at 21:55
  • @user448810, you have great experience with teaching these things. I was also thinking of maintenance of the entire routine, where you have to be clear to people who are just reading the code for the first time. They often lack the motivation to learn -- they just want to fix something and move on ASAP. The line is blurry, but somewhere between "this is simplistic and wasteful, we could do better!" and "OMG, a thousand lines I don't understand just to find out the array size?" (reduce 1000 as needed, but even 100 is likely to be problematic). – DanaJ Feb 26 '16 at 01:07