2

Let pi(x) denote the number of primes <= x. For example, pi(100) = 25. I would like to create a table which stores values of pi(x) for all x <= L. I figured the quickest way would be to use the sieve of Eratosthenes. First I mark all primes, and then I use dynamic programming, summing the count of primes and increasing each time a new prime appears. This is implemented in the Java code below:

public static int [] piTableSimple (int L)
    {
        int sqrtl = (int) Math.sqrt(L);
        int [] piTable = new int [L + 1];

        Arrays.fill(piTable, 1);
        piTable[0] = 0;
        piTable[1] = 0;

        for (int i = 2 ; i <= sqrtl ; i++)
            if (piTable[i] == 1)
                for (int j = i * i ; j <= L ; j += i)
                    piTable[j] = 0;

        for (int i = 1 ; i < piTable.length ; i++)
            piTable[i] += piTable[i - 1];

        return piTable;
    }

There are 2 problems with this implementation:

  1. It uses large amounts of memory, as the space complexity is O(n)

  2. Because Java arrays are "int"-indexed, the bound for L is 2^31 - 1

I can "cheat" a little though. Because for even values of x, pi(x) = pi(x - 1), enabling me to both reduce memory usage by a factor of 2, and increase the bound for L by a factor of 2 (Lmax <= 2^32).

This is implemented with a simple modification to the above code:

public static long [] piTableSmart (long L)
    {
        long sqrtl = (long) Math.sqrt(L);
        long [] piTable = new long [(int) (L/2 + 1)];

        Arrays.fill(piTable, 1);
        piTable[0] = 0;
        piTable[1] = 0;
        piTable[2] = 1;

        for (int i = 3 ; i <= sqrtl ; i += 2)
            if (piTable[(i + 1) / 2] == 1)
            {
                long li = (long) i;
                long inc = li * 2L;
                for (long j = li * li ; j <= L ; j += inc)
                    piTable[(int) ((j + 1) / 2)] = 0;
            }

        piTable[2] = 2;

        for (int i = 1 ; i < piTable.length ; i++)
            piTable[i] += piTable[i - 1];

        return piTable;
    }

Note that the value of pi(2) = 1 is not directly represnted in this array. But this has simple workarounds and checks that solve it. This implementation comes with a small cost, that the actual value of pi(x) is not accessed in a straight-forward way, but rather to access the value of pi(x), one has to use

piTable[(x + 1) / 2] 

And this works for both even and odd values of x of course. The latter completes creating a pi(x) table for x <= L = 10^9 in 10s on my rather slowish laptop.

I would like to further reduce the space required and also increase the bound for L for my purposes, without severly deteriorating performance (for example, the cost of slightly more arithmetic operations to access the value of pi(x) as in the latter code barely deteriorates performance). Can it be done in an efficient and smart way?

MC From Scratch
  • 465
  • 2
  • 7
  • 1
    [this answer](https://stackoverflow.com/questions/9625663/calculating-and-printing-the-nth-prime-number/9704912#9704912) mentions a method to calculate pi(x) in about O(x^0.7) time, FYI. – Will Ness Sep 05 '19 at 15:30
  • Yes, the most efficient method for calculating pi(x) is due to Lagarias, Miller & Odlyzko - `O(x^(2/3) / (log x)^2)`. But for large values of `x`, larger pre-computed pi tables are required. I am trying to figure out a way to efficiently store these values. It's an excellent answer by the way, thank you. @WillNess – MC From Scratch Sep 06 '19 at 19:12

2 Answers2

2

You should use a segmented Sieve of Eratosthenes, which reduces the memory requirement from O(n) to O(sqrt(n)). Here is an implementation.

Do you need to store all the pi? Here is a function that computes pi(x). It's reasonably quick up to 10**12.

If you find this useful, please upvote this answer and also the two linked answers.

user448810
  • 17,381
  • 4
  • 34
  • 59
  • Hey, thank you. My bigger goal is indeed the prime counting function pi(x). My current Java implementation of pi(x) computes `pi(10^12) = 37607912018` in just `0.11s`. I want to reach values higher than `10^20` though, which require large pi(x) tables, for all `x <= L` for some L. The `L` I need is around `10^11` (not for `10^20` specifically, but rather it is sufficient even for computing `pi(x)` for `x > 10^24`). For speed reasons, I need the entire table available, rather than a segmented one. – MC From Scratch Aug 29 '19 at 13:38
1

Now that I understand better what you want to do, I can give a better answer.

The normal way to compute pi(x) starts with pre-computed tables arranged at intervals, then uses a segmented sieve to interpolate between the pre-computed points; the pre-computations may be done by sieving or by any of several other methods. Those tables get big, as you have pointed out. If you want to be able to compute pi(x) up to 1020, and you are willing to sieve a range up to 1012 each time someone calls your function, you will need a table with 108 64-bit integers, which will take nearly a gigabyte of space; calls to your function will take about half-a-minute each for the sieving, assuming a recent-vintage personal computer. Of course, you can choose where you want to be on the time/space trade-off curve by choosing how many pre-computed points you will have.

You are talking about computing pi(x) for x > 1024, which will take lots more space, or lots more time, or both. Lots. Recent projects that have computed huge values of pi(x), for values of x like 1024 or 1025, have taken months to compute.

You might want to look at Kim Walisch's primesieve program, which has a very fast segmented sieve. You might also look at the website of Tomás Oliveira e Silva, where you will find tables of pi(x) up to 1022.

Having said all that, what you want to do probably isn't feasible.

user448810
  • 17,381
  • 4
  • 34
  • 59
  • Thank you for the answer. Indeed I am familiar with the links. I know such computations take a rather long time, and lots of space, but they are still feasible. If anything, I like the challenge in any case, and would like to reach as far as I can. So I am still trying to figure out a way to efficiently store large `pi(x)` tables. – MC From Scratch Sep 06 '19 at 19:15
  • Given the amount of data, you will probably need something on disk. Just a plain-ascii file with two tab-separated columns for x and pi(x), sorted in ascending order, is probably sufficient. The disk access time will be insignificant compared to the computation, so don't worry about not having the data in memory. – user448810 Sep 06 '19 at 19:54
  • Do you know of any source that contains such an extensive list of consecutive pi(x) values? – MC From Scratch Sep 07 '19 at 09:04
  • Silva's site has tables. – user448810 Sep 07 '19 at 14:50