Looking for a fast prime-counting function

Question

I need to calculate the number of primes less than or equal to some N, which is the prime-counting function or the PI function. I have this one but it works too slowly:

function PI(x) {
    var primes = 4;
    for (var i = 3; i <= x; i += 2) {
        if (i % 3 === 0 || i % 5 === 0 || i % 7 === 0) continue;
        var r = ~~Math.sqrt(i), p = true;
        for (var j = 2; j <= r; j++) {
            if (i % j === 0) {
                p = false;
                break;
            }
        }
        if (p)
            primes++;
    }
    return primes;
}

Calculating PI(1000000) takes almost one second, and calculating PI(10000000) takes as long as 20 seconds for me. That's too slow. Is there a faster way to do it?

The Sieve of Eratosthenes algorithm is a fairly efficient way of finding all of the primes up to n. https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes — Alex Patch, Apr 16 '17 at 01:24
The reason the question code is so slow is that it is a very inefficient version of trial division sieving; a proper implementation of the page-segmented bit-packed Sieve of Eratosthenes [such as my answer in another thread](https://stackoverflow.com/a/57108107/549617) will find the number of primes to a billion in well under a second, to a hundred billion (1e11) in under a minute, but that is still not a true modern prime counting function. My answer below can do these in fractions of seconds and can count the primes up to `2**53 - 1` (about 9e15) in a minute or so. — GordonBGood, Nov 25 '21 at 07:34

GordonBGood · Answer 1 · 2022-11-28T10:40:46.947

I'm even later to the party, but got triggered by "prime counting function" in the question title, as that doesn't get much attention on SO.

TLDR; How Does LMO work? Skip to the end for code and results...

The history of the main advancements of the art of prime counting functions is as follows:

In about 1830, Adrien-Marie Legendre developed a formula which uses the "inclusion-exclusion principle" to calculate the number of primes below x without enumerating all of the individual primes and only requiring knowing the base prime values to the square root of x.
In 1870 E. D. F. Meissel improved Legendre's formula in reducing the number of operations somewhat but requiring knowing the primes up to x^(2/3) instead of x^(1/2), where "knowing the primes" was/is generally by the SoE, but only requires storage of O(x^(1/3)/(ln x)) instead of O(x^(1/2)/(ln x)), although it only reduces the time complexity to O(x/((ln x)^3)) from O(x/((ln x)^2)).
At last Lagarias, Miller, and Odzlinko (LMO) developed their algorithm in 1985 improving the Meissel method, still requiring the same "knowledge of prime values" up to x^(2/3) for the basic version, but greatly reducing the required time to calculate "Phi" by splitting the calculation into two groups of calculations (at least, more for later improvements), one that determines the "ordinary" leaves of the "Phi" function by the usual Meissel method using recursive integer division but this work is negligible as there are only about x^(1/3) of these, and with the time to calculate the rest of the "special" leaves greatly reduced by using partial sieving and counting techniques. Partial sieving can be used because the definition of "Phi(x, a)" is the count of all the numbers below x that are not divisible by any primes below the ath prime, which is exactly what is produced by sieving up to x of the multiples the base prime values up to the ath prime. The computational complexity of the basic LMO method is O(x^(2/3)(log (log x))).

To understand how these techniques are used, let's consider them in succession in counting the primes to a trivial range of 64, as follows:

Consider the primes up to 64, which we can easily determine as 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 57, and 61, for a total of 18 primes; for this exercise, consider that we only know the values up to the square root of 64 is 8 for the Legendre technique (the four primes of 2, 3, 5, and 7) or for the Meissel-based techniques to the range of 64^(2/3) is 16 (the six primes of 2, 3, 5, 7, 11, and 13).
Legendre's technique says we just need to calculate the value of "Phi(64, 4) + 4 - 1" where 4 is the number of primes below the square root; the "Phi" function is the number of un-excluded values up to that limit, 64, which are included because they are not multiples of the four base prime values less than the square root of the limit, and is calculated by the recursive technique as used in the previous answer, in this case represented as 64 - 64/2 - 64/3 + 64/2/3 - 64/5 + 64/5/2 + 64/5/3 - 64/5/2/3 - 64/7 + 64/7/2 + 64/7/3 - 64/7/2/3 + 64/7/5 - 64/7/5/2 - 64/7/5/3 + 64/7/5/2/3 = 15, which when four is added and one subtracted gives the correct answer of 18. This is called the "inclusion-exclusion principle" because the terms with odd numbers of prime divisors are excluded/subtracted, and the terms with even numbers of prime divisors are included/added. You can see that this takes a lot of division operations, although there are techniques such as the caching used in the previous answer to reduce these for such small ranges but unfortunately only work up to a limit.
For the Meissel technique, calculating Phi(64, 2) + 2 - 1 where 2 is the number of primes up to the cube root of the limit of four is much easier to calculate as 64 - 64/2 - 64/3 + 64/2/3 = 21 which when added to 2 and 1 is subtracted is 22 but that is not the final answer since we need to subtract the "P2" term which is the parts not excluded, thus based on the primes 5 and 7, and is calculated as the sum of the number of primes up to 64/5 - 3 + 1 plus 64/7 - 4 + 1, which is 4. So the final answer is 22 minus this 4 pairs is 18 as it should be.
The LMO technique works exactly the same as the above except that for this trivial limit there is one "special" leaf, which is the 64/2/3 leaf since 2 times 3 is 6 exceeds the cube root of 64 which is 4. If implementing this as LMO would do, to find the "Phi" for this quotient Phi(64/6, 0) we don't sieve at all since the least prime factor of the divisor is two and just count the remaining number of values up to the quotient to get the same answer of 10 for "Phi" for the leaf. The number of "special" leaves is small for such a trivial limit, but the important thing is that the number of special leaves only increases as O((limit^(2/3))/(log (limit^(1/3)))^2). Since the work to do the sieving is O(limit^(2/3)(log log limit)) which is larger, that is the controlling factor for the overall computational complexity.

So how do we implement the basic LMO algorithm? There are several steps as follows:

We make a table of the primes up to the cube root of the limit (about (limit^(1/3)) / (ln limit) in size), which takes negligible time.
We can use these primes to calculate the "S1" ordinary part of "Phi" by the recursive division technique, which takes negligible time as there are so few calculations as in only O(limit^(1/3)).
We make a table of the values of the count of primes up to the cube root from the base primes table (largest table as O(limit^(1/3)) in size); we also make a table of the intermediate "Phi" counts for each of the partially sieved steps up to the cube root base prime value, and a table of the determined "S2" root leaf parameters which can actually be determined while the "S1" value is being calculated above; all of these take negligible time and the largest (such as the last one) is proportional to just the cube root of the range, which is why space used in O(limit^(1/3)).
We sieve by page segments up to the limit^(2/3) range where each page is partially culled by one base prime value at a time above the minimum as found in the root "S2" leaves table, accumulating the sum of the counts of the number of un-culled values up to each of the "S2" leaf quotient values.
A basic page-segmented odds-only bit-packed SoE takes about 80 seconds to count the primes up to the 53-bit limit to the two thirds power, but we need to use a modified version that updates a counter table for each newly culled value for some of the culling operations, so it takes about half again as long or about 120 seconds.
The counting of the un-culled values up to each "S2" point then uses a batch counter array in this implementation, with the tiny final range for increments less than the batch size counted by a fast "pop count" technique. The original LMO paper describes the use of a binary indexed tree to accomplish this, but does not attempt to reduce the tree depth which is partly why it is somewhat slower than this implementation as it adds a log x factor to the time to cull due to this log x cost factor in updating the counters while culling which would make the culling time many times slower; the counting and accumulating for the number of "special" leaves takes just something in the order of tens of seconds for the above maximum range for this implementation. This cost is explained and solutions proposed by Kim Walisch.
When all of the base prime values up to each individual limit as needed for the "S2" calculation have been completed for a given page segment, the page-segment is then completely culled to the base prime value less than the square of the maximum value represented in the segment and the "P2" count values within the range of the segment are determined from this completed sieved page without having to sieve the identical range again.
When all page segments to the range^(2/3) limit are completed, the accumulators will contain the required values for "S2" and "P2" so that the S1 + S2 - P2 + number of base primes to the cube root of the limit - 1 can be output as the required answer of the count of prime numbers to that limit.

The following code implements the basic LMO algorithm (without the "alpha" optimization which would add considerably to the code complexity), the code for the "S2" special leaves has been loosely but not exactly translated from the "primecount" C++ code by Kim Walisch:

"use strict";

const MAXVALUE = 9007199254740991; // 2**53 - 1

// creates a function returning a lazily memoized value from a thunk...
function lazy(thunk) {
  let value = undefined;
  return function() {
if (value === undefined) { value = thunk(); thunk = null; }
return value;
  }
}

// a page-segmented odds-only bit-packed Sieve of Eratosthenes;

const PGSZBITS = 262144; // about CPU l1 cache size in bits (power of two)

const CLUT = function () { // fast "pop count" Counting Look Up Table...
  const arr = new Uint8Array(65536);
  for (let i = 0; i < 65536; ++i) {
let nmbts = 0 | 0; let v = i;
while (v > 0) { ++nmbts; v &= (v - 1) | 0; }
arr[i] = nmbts | 0; }
  return arr;
}();

function countPageFromTo(bitstrt, bitlmt, sb) {
  const fst = bitstrt >> 5; const lst = bitlmt >> 5;
  const pg = new Uint32Array(sb.buffer);
  let v0 = (pg[fst] | ((0xFFFFFFFF >>> 0) << (bitstrt & 31))) >>> 0;
  let cnt = ((lst - fst) << 5) + CLUT[v0 & 0xFFFF]; cnt += CLUT[v0 >>> 16];
  for (let i = fst | 0; i < lst; ++i) {
let v = pg[i] >>> 0;
cnt -= CLUT[v & 0xFFFF]; cnt -= CLUT[v >>> 16];
  }
  let v1 = (pg[lst] | ((0xFFFFFFFE >>> 0) << (bitlmt & 31))) >>> 0;
  cnt -= CLUT[v1 & 0xFFFF]; cnt -= CLUT[v1 >>> 16]; return cnt | 0;
}

function partialSievePage(lwi, bp, sb) {
  const btsz = sb.length << 3;
  let s = Math.trunc((bp * bp - 3) / 2); // compute the start index...
  if (s >= lwi) s -= lwi; // adjust start index based on page lower limit...   
  else { // for the case where this isn't the first prime squared instance
let r = ((lwi - s) % bp) >>> 0;
s = (r != (0 >>> 0) ? bp - r : 0) >>> 0; }
  if (bp <= 32) {
for (let slmt = Math.min(btsz, s + (bp << 3)); s < slmt; s += bp) {
  const shft = s & 7; const msk = ((1 >>> 0) << shft) >>> 0;
  for (let c = s >> 3, clmt = sb.length; c < clmt | 0; c += bp)
    sb[c] |= msk; } }
  else
for (let slmt = sb.length << 3; s < slmt; s += bp)
  sb[s >> 3] |= ((1 >>> 0) << (s & 7)) >>> 0;
}

function partialSieveCountPage(lwi, bp, cntarr, sb) {
  const btsz = sb.length << 3; let cullcnt = 0;
  let s = Math.trunc((bp * bp - 3) / 2); // compute the start index...
  if (s >= lwi) // adjust start index based on page lower limit...
s -= lwi;
  else { // for the case where this isn't the first prime squared instance
let r = ((lwi - s) % bp) >>> 0;
s = (r != (0 >>> 0) ? bp - r : 0) >>> 0; }
  if (bp <= 32) {
for (let slmt = Math.min(btsz, s + (bp << 3)); s < slmt; s += bp) {
  const shft = s & 7; const msk = ((1 >>> 0) << shft) >>> 0;
  for (let c = s >>> 3, clmt = sb.length; c < clmt | 0; c += bp) {
    const isbit = ((sb[c] >>> shft) ^ 1) & 1;
    cntarr[c >> 6] -= isbit; cullcnt += isbit; sb[c] |= msk; }
}
  }
  else
for (let slmt = sb.length << 3; s < slmt; s += bp) {
  const sba = s >>> 3; const shft = s & 7;
  const isbit = ((sb[sba] >>> shft) ^ 1) & 1;
  cntarr[s >> 9] -= isbit; cullcnt += isbit;
  sb[sba] |= ((1 >>> 0) << shft) >>> 0; }
  return cullcnt;
}

// pre-culled pattern of small wheel primes...
const WHLPRMS = [ 2, 3, 5, 7, 11, 13, 17 ];
const WHLPTRNLEN = WHLPRMS.reduce((s, v) => s * v, 1) >>> 1; // odds only!
const WHLPTRN = function() { // larger than WHLPTRN by one buffer for overflow
  const len = (WHLPTRNLEN + (PGSZBITS >>> 3) + 3) & (-4); // up 2 even 32 bits!
  const arr = new Uint8Array(len);
  for (let bp of WHLPRMS.slice(1)) partialSievePage(0, bp, arr);
  arr[0] |= ~(-2 << ((WHLPRMS[WHLPRMS.length - 1] - 3) >> 1)) >>> 0; return arr;
}();

function fillPage(lwi, sb) {
  const mod = (lwi / 8) % WHLPTRNLEN;
  sb.set(new Uint8Array(WHLPTRN.buffer, mod, sb.length));
}

function cullPage(lwi, bpras, sb) {
  const btsz = sb.length << 3; let bp = 3;
  const nxti = lwi + btsz; // just beyond the current page 
  for (let bpra of bpras()) {
for (let bpri = 0; bpri < bpra.length; ++bpri) {
  const bpr = bpra[bpri]; bp += bpr + bpr;
  let s = (bp * bp - 3) / 2; // compute start index of prime squared
  if (s >= nxti) return; // enough bp's
  partialSievePage(lwi, bp, sb);
}
  }
}

function soePages(bitsz, bpras) {
  const buf =  new Uint8Array(bitsz >> 3); let lowi = 0;
  const gen = bpras === undefined ? makeBasePrimeRepArrs() : bpras;
  return function*() {
while (true) {
  fillPage(lowi, buf); cullPage(lowi, gen, buf);
  yield { lwi: lowi, sb: buf }; lowi += bitsz; }
  };
}

function makeBasePrimeRepArrs() {
  const buf = new Uint8Array(128); let gen = undefined; // avoid data race!
  fillPage(0, buf);
  for (let i = 8, bp = 19, sqr = bp * bp; sqr < 2048+3;
                                      ++i, bp += 2, sqr = bp * bp)
if (((buf[i >> 3] >>> 0) & ((1 << (i & 7)) >>> 0)) === 0 >>> 0)
  for (let c = (sqr - 3) >> 1; c < 1024; c += bp)
    buf[c >> 3] |= (1 << (c & 7)) >>> 0; // init zeroth buf
  function sb2bprs(sb) {
const btsz = sb.length << 3; let oi = 0;
const arr = new Uint8Array(countPageFromTo(0, btsz - 1, sb));
for (let i = 0, j = 0; i < btsz; ++i)
  if (((sb[i >> 3] >>> 0) & ((1 << (i & 7)) >>> 0)) === 0 >>> 0) {
    arr[j++] = (i - oi) >>> 0; oi = i; }
return { bpra: arr, lastgap: (btsz - oi) | 0 };
  }
  let { bpra, lastgap } = sb2bprs(buf);
  function next() {
const nxtpg = sb2bprs(gen.next().value.sb);
nxtpg.bpra[0] += lastgap; lastgap = nxtpg.lastgap;
return { head: nxtpg.bpra, tail: lazy(next) };
  }
  const lazylist = { head: bpra, tail: lazy(function() {
if (gen === undefined) {
  gen = soePages(1024)(); gen.next() } // past first page
return next();
  }) };
  return function*() { // return a generator of rep pages...
let ll = lazylist; while (true) {  yield ll.head; ll = ll.tail(); }
  };
}

function *revPrimesFrom(top, bpras) {
  const topndx = (top - 3) >>> 1;
  const buf = new Uint8Array(PGSZBITS >>> 3);
  let lwi = (((topndx / PGSZBITS) >>> 0) * PGSZBITS) >>> 0;
  let si = (topndx - lwi) >>> 0;
  for (; lwi >= 0; lwi -= PGSZBITS) { // usually external limit!
const base = 3 + lwi + lwi;
fillPage(lwi, buf); cullPage(lwi, bpras, buf);
for (; si >= 0 >>> 0; --si)
  if (((buf[si >> 3] >>> 0) & ((1 << (si & 7)) >>> 0)) === (0 >>> 0))
    yield base + si + si;
si = PGSZBITS - 1;
  }
};

const TinyPrimes = [ 2, 3, 5, 7, 11, 13, 17, 19 ]; // degree eight
const TinyProduct = TinyPrimes.reduce((s, v) => s * v) >>> 0;
const TinyHalfProduct = TinyProduct >>> 1;
const TinyTotient = TinyPrimes.reduce((s, v) => s * (v - 1), 1) >>> 0;
const TinyLength = (TinyProduct + 8) >>> 2; // include zero and half point!
const TinyTotients = function() {
  const arr = new Uint32Array(TinyLength | 0);
  arr[TinyLength - 1] = 1; // mark mid point value as not prime - never is
  let spn = 3 * 5 * 7; arr[0] = 1; // mark zeroth value as not prime!
  for (let bp of [ 3, 5, 7 ]) // cull small base prime values...
for (let c = (bp + 1) >>> 1; c <= spn; c += bp) arr[c] |= 1;
  for (let bp of [ 11, 13, 17, 19 ]) {
for (let i = 1 + spn; i < TinyLength; i += spn) {
  const rng = i + spn > TinyLength ? spn >> 1 : spn;
  arr.set(new  Uint32Array(arr.buffer, 4, rng), i); }
spn *= bp;
for (let c = (bp + 1) >>> 1; // eliminate prime in pattern!
       c < (spn > TinyLength ? TinyLength : spn + 1); c += bp)
  arr[c] |= 1;
  }
  arr.reduce((s, v, i) => { // accumulate sums...
const ns = s + (v ^ 1); arr[i] = ns; return ns; }, 0);
  return arr;
}();  

function tinyPhi(m) {
  const d = Math.trunc(m / TinyProduct);
  const ti = (m - d * TinyProduct + 1) >>> 1;
  const t = ti < TinyLength
          ? TinyTotients[ti]
          : TinyTotient - TinyTotients[TinyHalfProduct - ti];
  return d * TinyTotient + t;
}

function *countPrimesTo(limit) {
  if (limit <= WHLPRMS[WHLPRMS.length - 1]) {
let cnt = 0; for (let p of WHLPRMS) { if (p > limit) break; else ++cnt; }
return cnt; }

  const bpras = makeBasePrimeRepArrs();
  if (limit < 1024**2 + 3) { // for limit < about a million, just sieve...
let p = 3; let cnt = WHLPRMS.length;
for (let bpra of bpras())
  for (let bpr of bpra) { // just count base prime values to limit
    p += bpr + bpr; if (p > limit) return cnt; ++cnt; }
  }

  if (limit <= 32 * 2 * PGSZBITS + 3) { // count sieve to about 32 million...
const lmti = (limit - 3) / 2;
let cnt = WHLPRMS.length; // just use page counting to limit as per usual...
for (let pg of soePages(PGSZBITS, bpras)()) {
  const nxti = pg.lwi + (pg.sb.length << 3);
  if (nxti > lmti) { cnt += countPageFromTo(0, lmti - pg.lwi, pg.sb); break; }
  cnt += countPageFromTo(0, PGSZBITS - 1, pg.sb);
}
return cnt;
  }

  // Actual LMO prime counting code starts here...
  const sqrt = Math.trunc(Math.sqrt(limit)) >>> 0;
  const cbrt = Math.trunc(Math.cbrt(limit)) >>> 0;
  const sqrtcbrt = Math.trunc(Math.sqrt(cbrt)) >>> 0;
  const top = cbrt * cbrt - 1; // sized for maximum required!
  const bsprms = function() {
let bp = 3; let cnt = WHLPRMS.length + 1; for (let bpra of bpras())
  for (let bpr of bpra) {
    bp += bpr + bpr; if (bp > cbrt) return new Uint32Array(cnt); ++cnt; }
  }();
  bsprms.set(WHLPRMS, 1); // index zero not used == 0!
  const pisqrtcbrt = function() {
let cnt = WHLPRMS.length; let i = cnt + 1; let bp = 3;
stop: for (let bpra of bpras())
  for (let bpr of bpra) {
    bp += bpr + bpr; if (bp > cbrt) break stop;
    if (bp <= sqrtcbrt) ++cnt; bsprms[i++] = bp >>> 0; }
return cnt;
  }();
  const pis = function() { // starts with index 0!
const arr = new Uint32Array(cbrt + 2); let j = 0;
for (let i = 1; i < bsprms.length; ++i)
  for (; j < bsprms[i]; ) arr[j++] = (i - 1) >>> 0;
for (; j < arr.length; ) arr[j++] = (bsprms.length - 1) >>> 0;
return arr;
  }();
  const phis = function() { // index below TinyPhi degree never used...
const arr = (new Array(bsprms.length)).fill(1);
arr[0] = 0; arr[1] = 3; arr[2] = 2; // unused
for (let i = WHLPRMS.length + 2; i < arr.length; ++i) {
  arr[i] -= i - WHLPRMS.length - 1; } // account for non phi primes!
return arr;
  }();
  // indexed by `m`, contains greatest prime factor index and
  // Moebius value bit; for Moebius, a one is negative...
  const specialroots = new Uint16Array(cbrt + 1); // filled in with S1 below...
  const S1 = function() { // it is very easy to calculate S1 recursively...
let s1acc = tinyPhi(limit);
function level(lmtlpfni, mfv, m) {
  for (let lpfni = 9; lpfni < lmtlpfni; ++lpfni) {
    const pn = bsprms[lpfni]; const nm = m * pn;
    if (nm > cbrt) { // don't split, found S2 root leaf...
      specialroots[m] = (lmtlpfni << 1) | (mfv < 0 ? 1 : 0); return; }
    else { // recurse for S1; never more than 11 levels deep...
      s1acc += mfv * tinyPhi(Math.trunc(limit / nm)); // down level...
      level(lpfni, -mfv, nm); // Moebius sign change on down level!
    } // up prime value, same level!
  }
}
level(bsprms.length, -1, 1); return s1acc;
  }();

  // at last, calculate the more complex parts of the final answer:
  function *complex() {
let s2acc = 0; let p2acc = 0; let p2cnt = 0; // for "P2" calculation
const buf = new Uint8Array(PGSZBITS >>> 3); let ttlcnt = 0;
const cnts = new Uint8Array(PGSZBITS >>> 9);
const cntaccs = new Uint32Array(cnts.length);
const revgen = revPrimesFrom(sqrt, bpras);
let p2v = Math.trunc(limit / revgen.next().value);
const lwilmt = Math.trunc((top - 3) / 2);

const updtmsk = ((PGSZBITS << 3) - 1) >>> 0;
let nxttm = Date.now() + 1000;
for (let lwi = 0; lwi <= lwilmt; lwi += PGSZBITS) { // for all pages
  if ((lwi & updtmsk) == updtmsk && Date.now() >= nxttm) {
    nxttm = Date.now() + 1000; yield lwi / lwilmt * 100.0 };
  let pgcnt = 0; const low = 3 + lwi + lwi;
  const high = Math.min(low + (PGSZBITS << 1) - 1, top);
  let cntstrti = 0 >>> 0;
  function countTo(stop) {
    const cntwrd = stop >>> 9; const bsndx = stop & (-512);
    const xtr = countPageFromTo(bsndx, stop, buf);
    while (cntstrti < cntwrd) {
      const ncnt = cntaccs[cntstrti] + cnts[cntstrti];
      cntaccs[++cntstrti] = ncnt; }
    return cntaccs[cntwrd] + xtr;
  }
  const bpilmt = pis[Math.trunc(Math.sqrt(high)) >>> 0] >>> 0;
  const maxbpi = pis[Math.min(cbrt, Math.trunc(Math.sqrt(limit/low)))]>>>0;
  const tminbpi = pis[Math.min(Math.trunc(top / (high + 1)),
                               bsprms[maxbpi]) >>> 0];
  const minbpi = Math.max(TinyPrimes.length, tminbpi) + 1;
  fillPage(lwi, buf); let bpi = (WHLPRMS.length + 1) >>> 0;
 
  if (minbpi <= maxbpi) { // jump to doing P2 if not range

    // for bpi < minbpi there are no special leaves...
    for (; bpi < minbpi; ++bpi) { // eliminate all Tiny Phi primes...
      const bp = bsprms[bpi]; const i = (bp - 3) >>> 1; // cull base primes!
      phis[bpi] += countPageFromTo(0, PGSZBITS - 1, buf);
      partialSievePage(lwi, bp, buf); }
    for (let i = 0; i < cnts.length; ++i) { // init cnts arr...
      const s = i << 9; const c = countPageFromTo(s, s + 511, buf);
      cnts[i] = c; pgcnt += c; }

    // for all base prime values up to limit**(1/6) in the page,
    // add all special leaves composed of this base prime value and
    // any number of other higher base primes, all different,
    // that qualify as special leaves...
    let brkchkr = false;
    for (; bpi <= Math.min(pisqrtcbrt, maxbpi) >>> 0; ++bpi) {
      const bp = bsprms[bpi];
      const minm = Math.max(Math.trunc(limit / (bp * (high + 1))),
                            Math.trunc(cbrt / bp)) >>> 0;
      const maxm = Math.min(Math.trunc(limit / (bp * low)), cbrt) >>> 0;
      if (bp >= maxm) { brkchkr = true; break; }
      for (let m = maxm; m > minm; --m) {
        const rt = specialroots[m];
        if (rt != 0 && bpi < rt >>> 1) {
          const stop = Math.trunc(limit / (bp * m) - low) >>> 1;
          const mu = ((rt & 1) << 1) - 1; // one bit means negative!
          s2acc -= mu * (phis[bpi] + countTo(stop));
        } }
      phis[bpi] += pgcnt; // update intermediate base prime counters
      pgcnt -= partialSieveCountPage(lwi, bp, cnts, buf);
      cntstrti = 0; cntaccs[0] = 0;
    }
    // for all base prime values > limit**(1/6) in the page,
    // add results of all special levels composed using only two primes...
    if (!brkchkr)
    for (; bpi <= maxbpi; ++bpi) {
      const bp = bsprms[bpi];
      let l = pis[Math.min(Math.trunc(limit / (bp * low)), cbrt)>>>0]>>>0;
      if (bp >= bsprms[l]) break;
      const piminm = pis[Math.max(Math.trunc(limit / (bp * (high + 1))),
                                  bp) >>> 0] >>> 0;
      for (; l > piminm; --l) {          
        const stop = Math.trunc(limit / (bp * bsprms[l]) - low) >>> 1;
        s2acc += phis[bpi] + countTo(stop);
      }
      phis[bpi] += pgcnt; // update intermediate base prime counters
      if (bpi <= bpilmt) {
        pgcnt -= partialSieveCountPage(lwi, bp, cnts, buf);
        cntstrti = 0; cntaccs[0] = 0; }
    }
  }

  // complete cull page segment, then count up "P2" terms in range...
  for (; bpi <= bpilmt; ++bpi) partialSievePage(lwi, bsprms[bpi], buf);
  let ndx = 0 >>> 0;
  while (p2v >= low && p2v <= high) {
    const nndx = (p2v - low) >>> 1;
    ++p2cnt; ttlcnt += countPageFromTo(ndx, nndx, buf);
    p2acc += ttlcnt;
    ndx = (nndx + 1) >>> 0; p2v = Math.trunc(limit / revgen.next().value);
  }
  if (ndx < PGSZBITS) ttlcnt += countPageFromTo(ndx, PGSZBITS - 1, buf);
}
const Piy = bsprms.length - 1;
// adjust for now known delta picbrt to pisqrt!
p2acc -= p2cnt * ((p2cnt - 1) / 2 + (Piy - WHLPRMS.length));
return S1 + s2acc - p2acc + Piy - 1;
  }
  const gen = complex(); let lst = gen.next();
  for (; !lst.done; lst = gen.next()) yield lst.value;
  return lst.value;
}

let cancelled = false;

function doit() {
  const limit =  Math.floor(parseFloat(document.getElementById('limit').value));
  const start = Date.now();
  const pggen = countPrimesTo(limit);
  function pgfnc() {
if (cancelled) {            
  document.getElementById('output').innerText = "Cancelled!!!";
}
else {
  const pgrslt = pggen.next();
  if (!pgrslt.done) {
    // value is a percent done...            
    document.getElementById('output').innerText = "Sieved " + (pgrslt.value.toFixed(3)) + "%";
    setTimeout(pgfnc, 7); return;
  }
  // value is the count result...
  const elpsd = Date.now() - start;
  document.getElementById('output').innerText = "Found " + pgrslt.value 
+ " primes up to " + limit + " in " + elpsd + " milliseconds.";      
}
cancelled = false;    
document.getElementById('go').onclick = strtclick;
document.getElementById('go').value = "Start Sieve...";            
document.getElementById('go').disabled = false;
  }
  pgfnc();
}

const cancelclick = function () {
  cancelled = true;
  document.getElementById('go').disabled = true;
  document.getElementById('go').value = "Cancelled!!!";
  document.getElementById('go').onclick = strtclick;
}

const strtclick = function () {
  const limit =  Math.floor(parseFloat(document.getElementById('limit').value));
  if (!Number.isInteger(limit) || (limit < 0) || (limit > MAXVALUE)) {
    
  document.getElementById('output').innerText = "Top limit must be an integer between 0 and " + MAXVALUE + "!";
return;
  }
  document.getElementById('output').innerText = "Sieved 0%";
  document.getElementById('go').onclick = cancelclick;
  document.getElementById('go').value = "Running, click to cancel...";
  cancelled = false;
  setTimeout(doit, 7);
};

document.getElementById('go').onclick = strtclick;

html,
body {
  justify-content: center;
  align-items: center;
  text-align: center;
  font-weight: bold;
  font-size: 120%;
  margin-bottom: 10px;
}

.title {
  font-size: 150%;
}

.input {
  font-size: 100%;
  padding:5px 15px 5px 15px;
}

.output {
  padding:7px 15px 7px 15px;
}

.doit {
  font-weight: bold;
  font-size: 110%;
  border:3px solid black;
  background:#F0E5D1;
  padding:7px 15px 7px 15px;
}

<!doctype html>
<html>
  <head>
<meta http-equiv='Content-Type' content='text/html; charset=utf-8'>
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Legarias-Miller-Odzlinko Prime Counting Function...</title>
  </head>
  <body>
<div class="title">
  <text>
    Legarias-Miller-Odzlinko Prime Counting Function
  </text>
</div>
<div>
  <text>
    Top counting limit value:
  </text>
  <input class="input" type="textbox" id="limit" value="1e9" />
</div>
<div class="output">
  <text>The enforced limit is zero to 9007199254740991 (2**53-1).</text>
</div>
<div class="output">
  <text>Refresh the page to reset to default values or stop processing!</text>
</div>
<div class="output">
  <text id="output">
      Waiting to start...
  </text>
</div>
<div>
  <input class="doit" type="button" id="go" value="Start counting..." />
</div>
  </body>
</html>

On my Intel Skylake i5-6500 CPU at 3.6 GHz (single-threaded boost rate) running Google Chrome version 96, the above code can count the primes to ranges as follows:

limit	count	time (seconds)
1e9	50847534	0.006
1e10	455052511	0.016
1e11	4118054813	0.055
1e12	37607912018	0.234
1e13	346065536839	1.117
1e14	3204941750802	5.595
1e15	29844570422669	28.434
2**53-1 (about 9e15)	252252704148404	140.219

Note that when run under node.js, there is an additional about 0.1 second initialization delay but it runs a little faster (maybe 3%) with node.js version 16.6.1.

Interestingly, the original LMO author's program run on their IBM 3081 Model K mainframe computer (dual core at 38 MHz using about 30 Kilowatts) would have run this last basic (non-alpha) calculation to about 9e15 in about 120,000 seconds if it had been run single-core, making this program run in JavaScript single-core on a common current desktop computer about a thousand times faster where the ratio in clock speeds was only about a hundred times. This shows both the more efficient execution of modern CPU's and also the improved efficiency of my implementation. — GordonBGood, Nov 21 '21 at 01:32
Further work could be done to make the sieving calculations even faster with a more sophisticated SoE: for instance, if my SO answer [developing a maximally factorized page-segmented SoE in JavaScript](https://stackoverflow.com/a/57108107/549617) were adapted to this use, it would reduce the number of sieving operations by about 2.5 times and should make the above technique almost this ratio faster and still be the basic LMO algorithm, but I didn't do that here as it would increase the amount of code by a couple of hundred lines and wouldn't fit in this answer. — GordonBGood, Nov 21 '21 at 01:32
@Ovinus Real, Thank you - my usual rule of starting a repository is being bigger than a SO answer, and as this answer includes both the code and what would be in a README.md file, it is still well under that limit. The only way it is likely to get big enough to warrant a repo is if I were to move beyond basic LMO, and I am unlikely to do that in JavaScript as it isn't a language in which I prefer to work; the reason I used it here was to be able to get a runnable snippet within the answer itself. — GordonBGood, Nov 22 '21 at 04:51
This is much appreciated, but there is a bug in the code, because it produces the following output: Found 95813465 primes up to 1948441248 in 28 milliseconds. Found 95699684 primes up to 1948441249 in 28 milliseconds. — oluckyman, Nov 26 '22 at 00:00
@luckyman, thanks for finding this error! A few minutes work located the bug having to do with the calculation of "P2" and your testing at the exact point where the counting range crossed an exact cube of a prime quickly helped show that the error was in erroneously including the cube root prime in the "P2" caclulation, which was easily corrected by adjusting the "top" variable to **just less** than what would be used by the cube root factor. Your test is exactly what was required to show this, and should have been included in my development. The code has been corrected, and thanks again. — GordonBGood, Nov 26 '22 at 08:55
@luckyman, sorry, there were two lines to be corrected and I only included one of them in my last edit; it has been corrected properly now... — GordonBGood, Nov 27 '22 at 04:39
@luckyman, added a further few related fixes to the code today, I think this completes the changes related to what you found... — GordonBGood, Nov 28 '22 at 10:50

Ovinus Real · Answer 2 · 2021-11-13T22:59:31.360

1

A bit late to the party, but this works in ~~O(n^2/3) time and space~~ (see GordonBGood's comment below).

function eratosthenesWithPi(n) {
  let array = [], upperLimit = Math.sqrt(n), output = [];
  let pi = [0, 0]

  for (let i = 0; i < n; i++) {
    array.push(true);
  }

  for (let i = 2; i <= upperLimit; i++) {
    if (array[i]) {
      for (var j = i * i; j < n; j += i) {
        array[j] = false;
      }
    }
  }

  let cnt = 0

  for (let i = 2; i < n; i++) {
    if (array[i]) {
      output.push(i);
      cnt++
    }

    pi.push(cnt)
  }

  return {primes: new Uint32Array(output), pi: new Uint32Array(pi)}
}

const phiMemo = []
let primes = []

function Phi(m, b) {
  if (b === 0)
    return m
  if (m === 0)
    return 0

  if (m >= 800) {
    return Phi(m, b - 1) - Phi(Math.floor(m / primes[b - 1]), b - 1)
  }

  let t = b * 800 + m

  if (!phiMemo[t]) {
    phiMemo[t] = Phi(m, b - 1) - Phi(Math.floor(m / primes[b - 1]), b - 1)
  }

  return phiMemo[t]
}

const smallValues = [0, 0, 1, 2, 2, 3]
let piValues

function primeCountingFunction(x) {
  if (x < 6)
    return smallValues[x]

  let root2 = Math.floor(Math.sqrt(x))
  let root3 = Math.floor(x ** (1/3))

  let top = Math.floor(x / root3) + 1

  if (root2 + 1 >= primes.length) {
    let res = eratosthenesWithPi(top + 2)

    primes = res.primes
    piValues = res.pi
  }

  let a = piValues[root3 + 1], b = piValues[root2 + 1]

  let sum = 0

  for (let i = a; i < b; ++i) {
    let p = primes[i]

    sum += piValues[Math.floor(x / p)] - piValues[p] + 1
  }

  let phi = Phi(x, a)

  return phi + a - 1 - sum
}

console.log(primeCountingFunction(1e8))

Try it on JSFiddle: https://jsfiddle.net/vo0g274f/1/

For me this takes around 31 ms. I'm working on a more space-efficient method atm, and I'll post it here once I'm done.

edited Nov 13 '21 at 22:59

answered Jul 13 '20 at 11:51

Ovinus Real

528
3
10

This is a very nice solution. Did you manage to create a `more space-efficient method` that you mentioned? – vitaly-t Oct 21 '21 at 18:23
1

With apologies, but this answer is not `O(n^(2/3))` but `O(n/((ln n)^3))` in asymptotic time complexity and only seems fast for "smaller" ranges due the factor of `1/((ln x)^ 3)` which has its biggest benefit for smaller ranges and to the "constant offset" optimizations in caching of small values of "Phi" for this simple implementation of the Meissel prime counting function; it takes almost a minute to count the primes to a trillion/1e12 and is limited in range by excessive memory use by the naive implementation of the Sieve of Eratosthenes (SOE) sieving used in this implementation. – GordonBGood Nov 13 '21 at 18:56
@GordonBGood Thanks for the correction! It'd be cool to implement a faster method, but at that point it's probably best to just compile some well-tested C++ library like primecount to WASM – Ovinus Real Nov 13 '21 at 22:58
1

@Ovinus Real, I'm thinking about whether it might be worth a JavaScript version of something reasonably simple such as LMO. Yes, one could do it by compiling the LMO parts of [Kim Walich's primecount](https://github.com/kimwalisch/primecount) to WASM, but I'm not sure it would be small enough to be posted as a SO answer, nor would it be as easy for people to use and possibly understand. I may work on this... – GordonBGood Nov 14 '21 at 00:37
@Ovinus Real, I've posted my [basic LMO implmentation in JavaScript](https://stackoverflow.com/a/70049343/549617) and it is about as fast as expected, although with a more sophisticated SoE with wheel factorization, it should run at least twice this fast (constant factor gain). You can run it as a snippet directly inside the answer on your web browser (use Destop view for mobile), but it would be quite easy to strip out the web page stuff to make it run as a node.js console application, in fact that's how I originally developed the code. Enjoy... – GordonBGood Nov 21 '21 at 01:40

Looking for a fast prime-counting function

2 Answers2

Linked