307

I observed that rand() library function when it is called just once within a loop, it almost always produces positive numbers.

for (i = 0; i < 100; i++) {
    printf("%d\n", rand());
}

But when I add two rand() calls, the numbers generated now have more negative numbers.

for (i = 0; i < 100; i++) {
    printf("%d = %d\n", rand(), (rand() + rand()));
}

Can someone explain why I am seeing negative numbers in the second case?

PS: I initialize the seed before the loop as srand(time(NULL)).

gsamaras
  • 71,951
  • 46
  • 188
  • 305
badmad
  • 2,059
  • 2
  • 14
  • 14
  • 13
    `rand()` can't be negative... – twentylemon Jun 13 '15 at 17:17
  • 296
    rand() + rand() can owerflow – maskacovnik Jun 13 '15 at 17:17
  • 15
    What is `RAND_MAX` for your compiler? You can usually find it in `stdlib.h`. (Funny: checking `man 3 rand`, it bears the one-line description "bad random number generator".) – Jongware Jun 13 '15 at 17:26
  • what if u give unsigned int or long long in place of int as specifier? – Sagar Kar Jun 13 '15 at 18:13
  • 2
    @SagarKar: That would be undefined behavior. `rand()+rand()` is an `int`. – Mooing Duck Jun 14 '15 at 01:44
  • 6
    do what every sane programmer would do `abs(rand()+rand())`. I'd rather have a positive UB than a negative one! ;) – Vinicius Kamakura Jun 15 '15 at 17:32
  • 12
    @hexa: that is no sotution for the UB, as the occurs for the addition already. You cannot make UB become _defined behaviour_. A _sane_ progrtammer would avoid UB like hell. – too honest for this site Jun 15 '15 at 18:07
  • 4
    Even without overflow, `rand() + rand()` would produce a triangular distribution (like rolling two dice), not a uniform one. That's probably not what you want. – dan04 Jun 15 '15 at 20:44
  • 3
    What do you mean "it **almost** always produces positive numbers" on its own? `rand()` is defined to produce numbers in a specific range (between 0 and RAND_MAX), so the only non-positive numbers you should be seeing would be zeros. So where are you seeing any negative numbers in the first case? – ClickRick Jun 16 '15 at 08:09
  • 2
    @dan04: but with overflow with RAND_MAX equal to INT_MAX, abs(rand() + rand()) has a uniform distribution again. – RemcoGerlich Jun 16 '15 at 09:44
  • Also, consider `arc4random()` when generating multiple random numbers. – Cœur Jun 16 '15 at 10:04
  • 2
    Also consider that `rand()+rand()` is not yielding the same random distribution between 0 and 2*RAND_MAX, for the same reason that rolling a 6-sided die twice yields different results than rolling a 12-sided die once. – Gus Jun 16 '15 at 16:33
  • 3
    @Olaf didnt you get that taste of sarcasm? – dhein Jun 17 '15 at 13:39
  • @Zaibis: Sorry, I'm not good at that even in my own language sometimes :-) – too honest for this site Jun 17 '15 at 17:55
  • @XiangJi: "should be ..." Well, I should have gotten 1 cent (whatever currency) for that. There's a plethora of people out there who do not have an idea about the binary arithmetic and the C standard. Most think 2s complement is the only representation of signed values - if they think about that at all. – too honest for this site Jun 17 '15 at 17:59
  • 2
    @XiangJi: There is no problem in asking "a" question, basic or complex. You can miss a thing here and there. – badmad Jun 18 '15 at 18:20
  • 1
    **Hint**: try saving the random numbers to variables, and printing the pairs with a negative sum. – Colonel Panic Jun 18 '15 at 18:25
  • 1
    @Olaf what is a UB? What are you talking about? – M Y Jun 19 '15 at 03:42
  • @badmad Sure sure. That's what SO is about. Just a bit surprised it got this many upvotes and became the top question in the newsletter. – xji Jun 19 '15 at 08:45
  • 1
    @hellyale:[_undefined bahaviour_](http://port70.net/~nsz/c/c11/n1570.html#3.4.3p1) – too honest for this site Jun 19 '15 at 11:10

9 Answers9

547

rand() is defined to return an integer between 0 and RAND_MAX.

rand() + rand()

could overflow. What you observe is likely a result of undefined behaviour caused by integer overflow.

P.P
  • 117,907
  • 20
  • 175
  • 238
  • Comments are not for extended discussion; this conversation has been [moved to chat](http://chat.stackoverflow.com/rooms/80565/discussion-on-answer-by-blue-moon-why-does-rand-rand-produce-negative-numb). – Martijn Pieters Jun 15 '15 at 11:01
  • Actually this would depend if `rand` returns a signed or unsigned integer, as overflow for unsigned integers is well defined. – Jakub Arnold Jun 15 '15 at 18:04
  • @JakubArnold: the standard defined `rand()` to be `int`. – too honest for this site Jun 15 '15 at 18:04
  • @Olaf oh sorry I didn't notice it was a C specific question, thought it was language agnostic. – Jakub Arnold Jun 15 '15 at 18:05
  • 4
    @JakubArnold: How that as overflow behaviour is specified by each language differently? Python for instance has none (well, up to available memory), as int just grows. – too honest for this site Jun 15 '15 at 18:09
  • 2
    @Olaf It depends how a language decides to represent signed integers. *Java* had no mechanism to detect integer overflow (until java 8) and defined it to wrap around and *Go* uses 2's complement representation only and defines it legal for signed integer overflows. C obviously supports more than 2's complement. – P.P Jun 15 '15 at 18:28
  • @BlueMoon: That was exatly my point. However, carefull, otherwise we will be moved to chat:-) (C does allow other than 2s complement, but gcc does actually not, btw.) – too honest for this site Jun 15 '15 at 18:30
  • @JakubArnold even if it returned unsigned; using `%d` to print an unsigned integer is undefined. – M.M Jun 15 '15 at 21:04
  • And as commented below? that it can overflow. When that happens it goes from the largest positive number it can take, directly to the negative equal. – Evan Carslake Jun 16 '15 at 16:45
  • 2
    @EvanCarslake No, that's not a universal behaviour. What you say is about 2's complement representation. But C language allows for other representations too. The C language specification says signed integer overflow is *undefined*. So in general, no program should rely on such behaviour and need to code carefully to not cause signed integer overflow. But this is not applicable for unsigned integers as they would "wrap-around" in a well-defined (reduction modulo 2) manner. [continued]... – P.P Jun 16 '15 at 17:19
  • 12
    This is the quote from C standard related to signed integer overflow: **If an exceptional condition occurs during the evaluation of an expression (that is, if the result is not mathematically defined or not in the range of representable values for its type), the behavior is undefined.** – P.P Jun 16 '15 at 17:19
  • 3
    @EvanCarslake moving a bit away from the question the C compilers do use the standard and for signed integers they can assume that `a + b > a` if they know that `b > 0`. They can also assume that if there is later executed statement `a + 5` then current value is lower then `INT_MAX - 5`. So even on 2's complement processor/interpreter without traps program might not behave as if `int`s were 2's complement without traps. – Maciej Piechotka Jun 18 '15 at 08:02
91

The problem is the addition. rand() returns an int value of 0...RAND_MAX. So, if you add two of them, you will get up to RAND_MAX * 2. If that exceeds INT_MAX, the result of the addition overflows the valid range an int can hold. Overflow of signed values is undefined behaviour and may lead to your keyboard talking to you in foreign tongues.

As there is no gain here in adding two random results, the simple idea is to just not do it. Alternatively you can cast each result to unsigned int before the addition if that can hold the sum. Or use a larger type. Note that long is not necessarily wider than int, the same applies to long long if int is at least 64 bits!

Conclusion: Just avoid the addition. It does not provide more "randomness". If you need more bits, you might concatenate the values sum = a + b * (RAND_MAX + 1), but that also likely requires a larger data type than int.

As your stated reason is to avoid a zero-result: That cannot be avoided by adding the results of two rand() calls, as both can be zero. Instead, you can just increment. If RAND_MAX == INT_MAX, this cannot be done in int. However, (unsigned int)rand() + 1 will do very, very likely. Likely (not definitively), because it does require UINT_MAX > INT_MAX, which is true on all implementations I'm aware of (which covers quite some embedded architectures, DSPs and all desktop, mobile and server platforms of the past 30 years).

Warning:

Although already sprinkled in comments here, please note that adding two random values does not get a uniform distribution, but a triangular distribution like rolling two dice: to get 12 (two dice) both dice have to show 6. for 11 there are already two possible variants: 6 + 5 or 5 + 6, etc.

So, the addition is also bad from this aspect.

Also note that the results rand() generates are not independent of each other, as they are generated by a pseudorandom number generator. Note also that the standard does not specify the quality or uniform distribution of the calculated values.

too honest for this site
  • 12,050
  • 4
  • 30
  • 52
  • thx. the reason i was adding was to avoid '0' as the random number in my code. rand()+rand() was the quick dirty solution which readily came to my mind. – badmad Jun 13 '15 at 18:08
  • 14
    @badmad: So what if both calls return 0? – too honest for this site Jun 13 '15 at 18:08
  • what are the chances? rarer compared to using one rand() – badmad Jun 13 '15 at 18:13
  • 3
    @badmad: I just wonder if `UINT_MAX > INT_MAX != false` is guranteed by the standard. (Sounds likely, but not sure if required). If so, you can just cast a single result and increment (in that order!). – too honest for this site Jun 13 '15 at 18:16
  • @badmad: Consider this, ideally a random number generator needs a uniform distribution. This means that all numbers have equal chances to appear. Now, for rand() to be 0, that is a single value. But, rand() + rand() can yield 0 in 2+ ways: both calls return 0 or both calls return half INT_MAX meaning when they sum the result will be 0. In this scenario, you are now twice as likely to get 0. However, if RAND_MAX is larger than half INT_MAX, more values can sum to 0. In essence, the sum of two uniform distribution is not uniform. – Nicholas Frechette Jun 15 '15 at 18:42
  • @NicholasFrechette: It is not twice as likely to get two `0` in a row. p(0) = 1/(RAND_MAX + 1). p(0, 0) = p(0) * p(0). – too honest for this site Jun 15 '15 at 18:56
  • 3
    There is gain in adding multiple random numbers when you want a non-uniform distribution: http://stackoverflow.com/questions/30492259/get-a-random-number-focused-on-center – Cœur Jun 16 '15 at 00:09
  • @Cœur: Yes, but that is not the point here. You still have to care about overflow/UB. However, I will correct my answer a bit - thanks. A general problem here is to split between algorithm and implementation - this question is clearly about implementation. – too honest for this site Jun 16 '15 at 01:05
  • 6
    to avoid 0, a simple "while result is 0, re-roll" ? – Olivier Dulac Jun 16 '15 at 06:32
  • @OlivierDulac: There are applications you cannot allow for a loop (yeah, it is very unlikely, but - yes). My point is, however more about the int overflow. – too honest for this site Jun 16 '15 at 10:36
  • 2
    Not only is adding them a bad way to avoid 0, but it also results in a non-uniform distribution. You get a distribution like the results of rolling dice: 7 is 6 times as likely as 2 or 12. – Barmar Jun 16 '15 at 20:30
  • @Olaf [Re:](https://stackoverflow.com/questions/30821406/why-does-rand-rand-produce-negative-numbers/30821874#comment49690053_30821874) As I read §6.2.5 §6.2.5 9 "The range of nonnegative values of a signed integer type is a subrange of the corresponding unsigned integer type" implies `UINT_MAX >= INT_MAX`, not `UINT_MAX > INT_MAX`. Character types have other restrictions. The only system I have encountered with `INTn_MAX == UINTn_MAX` was BITD on an extra wide type where the underlying code only handled signed math, so the widest unsigned type had a padded bit instead of a sign bit. – chux - Reinstate Monica Feb 16 '18 at 22:52
  • @chux: Thanks for pointing that out. I actually thought I had covered that, but maybe my wording was a bit unclear. I actually did not state the `>` was guaranteed, but that it was a necessary requirement for the addition not to invoke UB. When I wrote this answer I was pretty new in standard nit-pickying. Anyway, I hope the new wording is a bit more clear. – too honest for this site Feb 16 '18 at 23:10
  • @Olaf Nice update. IMO, Code can assert `Utype_MAX/2 == Stype_MAX` unless one also wants high portability to 1's complement, sign-magnitude, 9-bit bytes and other exotic/historic platforms. Interesting exercises, yet ever less relevant. – chux - Reinstate Monica Feb 16 '18 at 23:22
  • No need for the division and potentially it adds another problem. The original condition `>` is sufficient. Just use a `_Static_assert` in the code. But that is a dirty hack anyway for most usages, as one rarely wants the full range. – too honest for this site Feb 16 '18 at 23:26
  • for 11 you've got 5:6, 6:5, but for 12 you've got 6:6, 6:6 so probabilities for getting 11 and 12 are equal. Perhaps you meant to say 10 which can be achived with 5:5, 5:5, 6:4, 4:6 which is twice as likely. – nurettin Mar 05 '21 at 15:32
36

This is an answer to a clarification of the question made in comment to this answer,

the reason i was adding was to avoid '0' as the random number in my code. rand()+rand() was the quick dirty solution which readily came to my mind.

The problem was to avoid 0. There are (at least) two problems with the proposed solution. One is, as the other answers indicate, that rand()+rand() can invoke undefined behavior. Best advice is to never invoke undefined behavior. Another issue is there's no guarantee that rand() won't produce 0 twice in a row.

The following rejects zero, avoids undefined behavior, and in the vast majority of cases will be faster than two calls to rand():

int rnum;
for (rnum = rand(); rnum == 0; rnum = rand()) {}
// or do rnum = rand(); while (rnum == 0);
Community
  • 1
  • 1
David Hammen
  • 32,454
  • 9
  • 60
  • 108
  • 9
    What about `rand() + 1`? – askvictor Jun 14 '15 at 09:35
  • 4
    @askvictor That could overflow (although it's unlikely). – gerrit Jun 14 '15 at 10:25
  • 3
    @gerrit - depends on MAX_INT and RAND_MAX – askvictor Jun 14 '15 at 10:26
  • 1
    @askvictor Oh, right. I thought they would be the same, but that is not stated anywhere. My mistake. – gerrit Jun 14 '15 at 10:30
  • 3
    @gerrit, I would be surprised if they are _not_ the same, but I suppose this is a place for pedants :) – askvictor Jun 14 '15 at 10:31
  • 1
    @askvictor - On my computer, with my compilers, `RAND_MAX` is `MAX_INT`. So adding one could overflow. Unlikely, but not impossible. I've used other systems where `RAND_MAX` is only 32767. There adding one cannot overflow. – David Hammen Jun 14 '15 at 11:47
  • @DavidHammen: 32767 + 1 == 0x8000, which **is** actually overflow for 16 bit `int` (reason for this value of RAND_MAX might very well be some 16 bit legacy/function call). Please note that `int` can be very well 16 bit wide (and _is_ for many systems). – too honest for this site Jun 14 '15 at 12:14
  • Unlikely but if your application generates 1 million random numbers per second it's going to happen some 20 times per day, assuming RAND_MAX==MAX_INT on a 32 bit system. – gerrit Jun 14 '15 at 13:55
  • 12
    If RAND_MAX==MAX_INT, rand() + 1 will overflow with exactly the same probability as the value of rand() being 0, which makes this solution completely pointless. If you are willing to risk it and ignore the possibility of an overflow, you can as well use rand() as is and ignore the possibility of it returning 0. – Emil Jeřábek Jun 14 '15 at 16:47
  • 1
    @DavidHammen Yes, thats an elegant solution. – badmad Jun 14 '15 at 23:19
  • @gerrit - "That could overflow (although it's unlikely)." Don't you mean "That *will* eventually overflow." – Kevin Fegan Jun 21 '15 at 12:22
  • 2
    @KevinFegan Any event with $p>0$ will eventually happen if the experiment is repeated sufficiently often. The two statements are identical. – gerrit Jun 22 '15 at 09:40
3

Basically rand() produce numbers between 0 and RAND_MAX, and 2 RAND_MAX > INT_MAX in your case.

You can modulus with the max value of your data-type to prevent overflow. This ofcourse will disrupt the distribution of the random numbers, but rand is just a way to get quick random numbers.

#include <stdio.h>
#include <limits.h>

int main(void)
{
    int i=0;

    for (i=0; i<100; i++)
        printf(" %d : %d \n", rand(), ((rand() % (INT_MAX/2))+(rand() % (INT_MAX/2))));

    for (i=0; i<100; i++)
        printf(" %d : %ld \n", rand(), ((rand() % (LONG_MAX/2))+(rand() % (LONG_MAX/2))));

    return 0;
}
Khaled.K
  • 5,828
  • 1
  • 33
  • 51
2

May be you could try rather a tricky approach by ensuring that the value returned by sum of 2 rand() never exceeds the value of RAND_MAX. A possible approach could be sum = rand()/2 + rand()/2; This would ensure that for a 16 bit compiler with RAND_MAX value of 32767 even if both rand happens to return 32767, even then (32767/2 = 16383) 16383+16383 = 32766, thus would not result in negative sum.

Jibin Mathew
  • 4,816
  • 4
  • 40
  • 68
  • 1
    The OP wanted to exclude 0 from the results. The addition also does not provide a uniform distribution of random values. – too honest for this site Jun 16 '15 at 23:20
  • @Olaf: There's no guarantee that two consecutive calls to `rand()` won't both yield zero, so a desire to avoid zero isn't a good reason for adding two values. On the other hand, a desire to have a non-uniform distribution would be a good reason to add two random values if one ensures that overflow doesn't occur. – supercat Jun 13 '18 at 16:53
1

To avoid 0, try this:

int rnumb = rand()%(INT_MAX-1)+1;

You need to include limits.h.

lost_in_the_source
  • 10,998
  • 9
  • 46
  • 75
Doni
  • 35
  • 2
1

the reason i was adding was to avoid '0' as the random number in my code. rand()+rand() was the quick dirty solution which readily came to my mind.

A simple solution (okay, call it a "Hack") which never produces a zero result and will never overflow is:

x=(rand()/2)+1    // using divide  -or-
x=(rand()>>1)+1   // using shift which may be faster
                  // compiler optimization may use shift in both cases

This will limit your maximum value, but if you don't care about that, then this should work fine for you.

Kevin Fegan
  • 1,292
  • 1
  • 16
  • 29
  • 1
    Sidenote: Careful with right shifts of signed variables. It is only well-defined for nonnegative values, for negatives, it is implementation defined. (Luckily, `rand()` always returns a nonnegative value). However, I would leave the optimization to the compiler here. – too honest for this site Jun 25 '15 at 02:21
  • @Olaf: In general, signed division by two will be less efficient than a shift. Unless a compiler writer has invested effort in telling the compiler that `rand` will be non-negative, the shift will be more efficient than division by a signed integer 2. Division by `2u` could work, but if `x` is an `int` may result in warnings about implicit conversion from unsigned to signed. – supercat Jun 13 '18 at 16:55
  • @supercat: Please read my comment car3efully again. You should very well know any reasonable compiler will use a shift for `/ 2` anyway (I've seen this even for something like `-O0`, i.e. without optimisations explicitly requested). It's possibly the most trivial and most established optimisation of C code. Point is the division is well defined by the standard for the whole integer range, not only non-negative values. Again: leave optimsations to the compiler, write **correct** and clear code in the first place. This is even more important for beginners. – too honest for this site Jun 13 '18 at 17:05
  • @Olaf: Every compiler I've tested generates more efficient code when shifting `rand()` right by one or dividing by `2u` than when dividing by 2, even when using `-O3`. One might reasonably say such optimization is unlikely to matter, but saying "leave such optimizations to the compiler" would imply that compilers would be likely to perform them. Do you know of *any* compilers that actually will? – supercat Jun 13 '18 at 17:24
  • @supercat: You should use more modern compilers then. gcc just generated fine code the last time I checked the generated Assembler. Nevertheless, as much I apprechiate to have a groopie, I'd prefer not to be harrassed to the extend you present the last time. These posts are years old, my comments are perfectly valid. Thank you. – too honest for this site Jun 13 '18 at 17:26
  • @Olaf: I just checked gcc 8.1 and clang 6.0 on godbolt. Given `int rdiv2a(void) { return rand()/2; } int rdiv2b(void) { return rand()>>1; }` they produce 8 or 9 instructions (respectively) for rdiv2a, while producing 5 for rdiv2b. Using `/2` isn't "leaving optimizations up to the compiler", it's deciding to do without them. In most cases, such optimizations won't matter, but one should be aware that's what one is doing. – supercat Jun 13 '18 at 18:26
  • @supercat: Yes, it seems to do this to handle the negative case correctly. But it does **not** use a division as you probosed. The two additional instructions are at most 1 clock cycle each (they can likely be folded in a non-synthetic program, but even if not, it is completely irrelevant here), btw. If you want to have no difference, use `unsigned`, which is as fine for `rand()` and the much better "optimisation" anyway here. Oh, and a different arch than x64 (which I used) could very well result in identical code for both. Btw: That's all I will discuss here. You're proposing bad practice. – too honest for this site Jun 13 '18 at 19:12
0

thx. the reason i was adding was to avoid '0' as the random number in my code. rand()+rand() was the quick dirty solution which readily came to my mind

It sounds like an XY problem to me, in which in order to not get a 0 from rand(), you call rand() two times, doing the program slower, with a new setback and the possibility of getting a 0 is still there.

Another solution is using uniform_int_distribution, which creates a random and uniformly distributed number in the defined interval:

https://wandbox.org/permlink/QKIHG4ghwJf1b7ZN

#include <random>
#include <array>
#include <iostream>
 
int main()
{
    const int MAX_VALUE=50;
    const int MIN_VALUE=1;
    std::random_device rd;
    std::mt19937 gen(rd());
    std::uniform_int_distribution<> distrib(MIN_VALUE, MAX_VALUE);
    std::array<int,MAX_VALUE-MIN_VALUE> weight={0};

    for(int i=0; i<50000; i++) {
        weight[distrib(gen)-MIN_VALUE]++;
    }
    
    for(int i=0;i<(int)weight.size();i++) {
        std::cout << "value: " << MIN_VALUE+i << " times: " << weight[i] << std::endl; 
    }
}
Jose
  • 3,306
  • 1
  • 17
  • 22
-2

While what everyone else has said about the likely overflow could very well be the cause of the negative, even when you use unsigned integers. The real problem is actually using time/date functionality as the seed. If you have truly become familiar with this functionality you will know exactly why I say this. As what it really does is give a distance (elapsed time) since a given date/time. While the use of the date/time functionality as the seed to a rand(), is a very common practice, it really is not the best option. You should search better alternatives, as there are many theories on the topic and I could not possibly go into all of them. You add into this equation the possibility of overflow and this approach was doomed from the beginning.

Those that posted the rand()+1 are using the solution that most use in order to guarantee that they do not get a negative number. But, that approach is really not the best way either.

The best thing you can do is take the extra time to write and use proper exception handling, and only add to the rand() number if and/or when you end up with a zero result. And, to deal with negative numbers properly. The rand() functionality is not perfect, and therefore needs to be used in conjunction with exception handling to ensure that you end up with the desired result.

Taking the extra time and effort to investigate, study, and properly implement the rand() functionality is well worth the time and effort. Just my two cents. Good luck in your endeavors...

Mark Krug
  • 53
  • 3
  • 2
    `rand()` does not specify what seed to use. The standard _does_ specify it to use a pseudorandom generator, not a relation to any time. It also does not state about the qualitiy of the generator. The actualy problem is clearly the overflow. Note that `rand()+1` is used to avoid `0`; `rand()` does not return a negative value. Sorry, but you did miss the point here. It is not about the quality of the PRNG. ... – too honest for this site Jun 16 '15 at 10:49
  • ... Good practice under GNU/Linux its to seed from `/dev/random` and use a good PRNG afterwards (not sure about the quality of `rand()` from glibc) or continue using the the device - risking your application to block if there is not enough entropy available. Trying to get your entropy in the application might very well be a vulnerability as that is possibly easier to attack. And now it comes to hardening - not here – too honest for this site Jun 16 '15 at 10:50