18

I'm currently using 1 + (int)(rand() * 999.0 / RAND_MAX) to generate a random number between 1 and 999 inclusive but the two and one digit numbers don't occur as often as the three digit numbers.

How can I fix this?

Note that although the original code gives a range of 0 to 999 inclusive, I actually want a range of 1 to 999 inclusive.

  • 24
    You want a non-uniform distribution where the probability for the 900 3-digit numbers is 1/3 of that for the 9 1 digit numbers, and so on. A simple way to achieve this practically is to pick a random number between 1 and 3, uniformly, then pick that many random digits from 0 to 9. – Andrew May 06 '16 at 13:15
  • 2
    @Andrew almost correct ;) The first digit shouldnt be 0 with this approach – 463035818_is_not_an_ai May 06 '16 at 13:16
  • 6
    My idea was similar, get a random between 1 and 3 and if its 1, pick another random number 1-9, if it's 2, 10-99 and if it's 3 100-999. That should do it,. – Vucko May 06 '16 at 13:18
  • 2
    A small point, but also see http://stackoverflow.com/questions/10219355/does-n-rand-rand-max-make-a-skewed-random-number-distribution: Dividing by `RAND_MAX` does create bias, but probably no worse than `rand()` itself. – Bathsheba May 06 '16 at 13:22
  • @Bathsheba - I assume the name is a reference to this: http://www.bbc.co.uk/news/uk-england-36064659 – neil May 06 '16 at 21:00
  • To expand on @Bathsheba’s comment — using your method of generating a uniform distribution of numbers has *very* bad statistical properties, don’t use it. Full stop. Some people will argue that “well, it’s actually not too bad for most applications.” Disregard these people. There are trivial ways in C++ of generating much better distributions, there’s no reason whatsoever to settle for something objectively worse. – Konrad Rudolph May 09 '16 at 09:00

3 Answers3

29

Your observation that one digit numbers don't occur as often as two and three digit numbers is not surprising.

There are only 9 one digit numbers (not including zero), but there are 90 two-digit ones, and 900 three digit ones. So a uniform random number generator will draw numbers in that frequency.

To generate random numbers in the range [1, 999] such that the probability of their having 1, 2, and 3 digits is equal, use your favourite generator to generate a random number p, say, in the range [0, 1) (see the new random library functions in C++ to do that), and transform it using std::pow(1000, p);.

You should note that the resulting distribution will not be piecewise-uniform: that is to say that the probability of drawing a number with a certain number of digits is not the same as the probability of drawing any other number with that number of digits. But it does have a continuous and differentiable cumulative density function which can be important mathematically.

(For the mathematically-inclined, the transformation I'm applying is the quantile function of the distribution that the OP needs).

Bathsheba
  • 231,907
  • 34
  • 361
  • 483
  • How exactly does this do what you want? First of all, `pow(base, exp)` works as base^exp, not the other way around. In your case its 1000^p? – Vucko May 06 '16 at 14:52
  • 1
    @Vucko Yes. So what is `1000^(1/3)`? What is `1000^(2/3)`? What is `1000^1`? – Sumurai8 May 06 '16 at 15:30
  • I know, but how does this ensure a uniform distribution and also `integer` numbers? – Vucko May 06 '16 at 15:37
  • 26
    Anybody considering this solution should note that it's **not** a uniform distribution. You're 6 times more likely to get a 1 than a 9, 9 times more likely to get 10 than 99, and 10 times more likely to get 100 than 999. And you won't get zero at all. – Mark Ransom May 06 '16 at 16:35
  • 1
    I'm sorry but this was exactly what I wanted. I've amended my answer. –  May 06 '16 at 18:36
  • 1
    Hum. This answer was incorrect until you edited the question. @MarkRansom raised a very good point. – Bathsheba May 06 '16 at 18:40
  • @FaceyMcFaceFace then you might also be interested in [Benford's Law](https://en.wikipedia.org/wiki/Benford%27s_law). – Mark Ransom May 06 '16 at 19:26
20

You can also use if statements, which are a little bit faster:

    int m=rand();
    if(m%3+1==3)
        z=(int)rand()%900+100;
    else if(m%3+1==2)
        z=(int)rand()%90+10;
    else if(m%3+1==1)
        z=(int)rand()%10;

The clock() difference for 100,000,000 repeats is:

t_pow: 23912
t_if: 6640

Test code using clock ticks for performance distribution - IF, POW

Difference in distribution between if and pow variant: Plot in wolframalpha.com

Foto Blysk
  • 344
  • 2
  • 11
  • 1
    If I'm reading that right, you're generating random numbers in the range of (0-9), (10-19), (100-199).... I think your first two mods need to be 900 and 90, not 100 and 10. – Hellion May 06 '16 at 15:49
  • You are right, sorry for that mistake, I will edit it. – Foto Blysk May 06 '16 at 15:55
  • upvoted. Sorry I had the incorrect question. I've changed it a bit. I don't want the number zero. –  May 06 '16 at 18:35
  • Your answer is biased. http://stackoverflow.com/questions/10984974/why-do-people-say-there-is-modulo-bias-when-using-a-random-number-generator – Olivier Grégoire May 07 '16 at 10:16
  • 1
    This is a good answer if you do desire piecewise uniformity. But over-consumption of random numbers can create problems with some generators (e.g. Sobol). Just be aware of this. And note that taking a modulus of `rand()` tends to create bias, particularly for a small modulus. Do test that this has the required statistical properties. – Bathsheba May 09 '16 at 07:01
  • 1
    @Bathsheba actually the smaller the modulus is compared to `RAND_MAX`, the lower the bias - the worst case is `% RAND_MAX`, where `0` is *twice* as likely to occur as any other number. And if the modulus is a power of 2, for most (all?) implementations of `rand` the bias is eliminated completely. – Mark Ransom May 10 '16 at 19:36
2

You can do it in one statement that uses 2 randoms. Multiplies the first by 10 ** second. The second is 1, 2 or 3. Here it is as an Excel formula:

=Int(Rand()*10^(int(Rand()*3)+1))
Zach Saucier
  • 24,871
  • 12
  • 85
  • 147
Lefty
  • 391
  • 1
  • 13
  • 1
    A good answer when you have a floating-point random number generator. Unfortunately C/C++'s `rand` isn't one of them. And @FaceyMcFaceFace, yes it *does* allow zero, if the first `rand` returns zero or very close to it. And for that reason it's also slightly biased to the lower numbers; 0-9 will appear more than 1/3 of the time. – Mark Ransom May 06 '16 at 22:19
  • @mark ransom Yes, you're right. I realised the *10 and *100 options could still allow for lower numbers after I posted it. 0-9 can result from all 3 options and 10-99 can also result from the *100 option so there's an elevated chance of those too. Yes, in principle it can produce zero too. Didn't realise that about c++ Rand, my only experience c++ it is in Arduino and I thought it was a limitation of the hardware not the language itself. – Lefty May 06 '16 at 22:56
  • @Lefty `rand` is extremely old and primitive. C++ recently added more modern random number capabilities, see for example [`std::uniform_real_distribution`](http://en.cppreference.com/w/cpp/numeric/random/uniform_real_distribution). – Mark Ransom May 06 '16 at 23:05
  • @MarkRansom I'm pretty old and primitive myself so am well versed in using floating point random numbers on virtually every platform I've used. Most of the time, you're just manipulating them into Ints anyway. The Arduino library I've been playing with lately has it's own Int rand function which suits my purposes there - but I never realised that C++ itself doesn't allow for them. – Lefty May 07 '16 at 08:28