0

Trying to generate a random prime p in the range [2,2147483647] using the C++11 std::uniform_int_distribution.

It's been commented that this approach might not be correct:

It is not immediately obvious that this p is uniformly distributed over the set of all primes <= 2^31 - 1. Whatever uniformity and bias guarantees the random-number generator has, they refer to all integers within the range, but the code is "sieving" just the primes out of it.

However, from another similar S.O. article, it notes

As long as the input random numbers are uniformly distributed in the range, the primes chosen by this method will be uniformly distributed among the primes in that range, too.

Question

Is this code truly correct to generate a random prime?

https://onlinegdb.com/FMzz78LBq

#include <stdio.h>
#include <stdint.h>
#include <math.h>
#include <time.h>
#include <random>


int isPrimeNumber (int num)
{
  if (num == 1) return 0;

  for (int i = 2; i <= sqrt (num); i++)
  {
      if (num % i == 0)
      {
        // not prime
        return 0;
      }
  }

  // prime
  return 1;
}

int main ()
{
  std::random_device rd;
  std::mt19937 rng (rd ());
  // Define prime range from 2 to 2^31 - 1.
  std::uniform_int_distribution<int>uni (2, 2147483647);

  int prime;

  // Generate a random prime.
  do { prime = uni(rng); } while (!isPrimeNumber(prime));

  printf ("prime = %d", prime);

  return 0;
}
vengy
  • 1,548
  • 10
  • 18
  • 4
    It is "correct" in that it chooses a prime randomly. If you have some more stringent requirement, you'll need to say what it is. – Scott Hunter Jan 10 '22 at 14:31
  • 4
    Not efficient, but you choose primes in uniform way. – Jarod42 Jan 10 '22 at 14:33
  • 1
    Also, check `i = 2` seperatly. And then start from 3 increasing i by 2. That way you need to only check half of the numbers. If num is even, it will be caught by checking with 2, no need to check with other even numbers. – Robin Dillen Jan 10 '22 at 14:34
  • 2
    Why not do the traditional sieve and do the int distribution over the indexes? – Captain Giraffe Jan 10 '22 at 14:35
  • 1
    Certainly they are random. You seem to use the terms as if they were meaning the same (but not every random distribution is uniform) – 463035818_is_not_an_ai Jan 10 '22 at 14:37
  • 3
    The confusion seems to hinge on what's meant by "uniform". This code uniformly selects a prime. But primes themselves are not a uniform distribution of integers. – Drew Dormann Jan 10 '22 at 15:05
  • 2
    btw in C++ you should use the C++ versions of the headers, ie `` instead of `` etc. – 463035818_is_not_an_ai Jan 10 '22 at 15:08
  • @CaptainGiraffe a sieve with 2147483647 is rather large and it will take some time to construct it. Otherwise once the sieve is constructed and if you don't care using >200 Mb of memory it will be pretty efficient. – Jabberwocky Jan 10 '22 at 15:12
  • @Jabberwocky it's [only ~100M](https://www.wolframalpha.com/input/?i=count+of+primes+below+2%5E31) `int`s, less than half a gig of table – Caleth Jan 10 '22 at 15:16

2 Answers2

5

The code you presented:

  1. Uniformly picks a random prime number in the range. Any given prime number in the range will have the same probability of coming up as any other prime in the range.
  2. will not produce numbers that are uniformly distributed around the range of integers. E.g. there will be much more numbers in the range 1-1000 than there will be in the range 1000001-1001000.
  3. is extremely inefficient (if you were to generate many numbers)

You should get clarity on exactly what you want. If you wanted 2. above you need something different.

Jeffrey
  • 11,063
  • 1
  • 21
  • 42
  • A back of the envelope calculation is that OP's code is faster up to ~500 random primes, and the sieve and sample method is better above that – Caleth Jan 10 '22 at 15:23
  • "will not produce numbers that are uniformly distributed around the range of integers" That's OK. You cannot produce primes that are uniformly distributed around the range of integers. You can cannot produce primes that are uniformly distributed around the set of *primes* in a certain range, and that's what this code does. – n. m. could be an AI Jan 10 '22 at 15:32
1

A sketch for a proof:

Assume for the purpose of contradiction that the algorithm produces a non-uniform distribution over the primes. Then there must be a prime p that has a higher probability than a prime q. As these are also integers, that means that in the larger range from 2 to N, sieving out non-primes had no effect on either p or q, hence the uniform distribution had to produce p with a higher probability than q, which is a direct contradiction.

Note: Observe that this proof would not hold if you compute a random number and search for the next prime after that number.

However, as pointed out by Jeffrey, this algorithm is rather inefficient. One alternative comes to mind: In the range up to 2**31 you have about 10**8 primes, according to the pi function. So it is close to feasible to just create a lookup table with all those primes and store it, if not in your binary, in a separate file (I expect about 400 MB), and just compute a uniform random number in the appropriate interval and pick that number out of the lookup table.


However, just to be safe, be aware that for any purpose close to security, std::mt19937 is not a cryptographically secure PRNG.

bitmask
  • 32,434
  • 14
  • 99
  • 159