1

I was going through Google Interview Questions. to implement the random number generation from 1 to 7. I did write a simple code, I would like to understand if in the interview this question asked to me and if I write the below code is it Acceptable or not?

import time
def generate_rand():
    ret = str(time.time()) # time in second like, 12345.1234
    ret = int(ret[-1])
    if ret == 0 or ret == 1:
        return 1
    elif ret > 7:
        ret = ret - 7
        return ret
    return ret

while 1:
    print(generate_rand())
    time.sleep(1) # Just to see the output in the STDOUT
Rasmi Ranjan Nayak
  • 11,510
  • 29
  • 82
  • 122
  • I would use a modulo function and handle the special case of modulo returning 0. – sloppypasta Jan 18 '17 at 17:19
  • @morbidlycurious: Can you explain little more? – Rasmi Ranjan Nayak Jan 18 '17 at 17:21
  • Is it possible to use built-in/library `random`functions? – MBo Jan 18 '17 at 17:30
  • The main problem I see is that, because of the way you do your clamps for out-of-range values, the probability for each the digits 1-7 will not be the same. Also, converting a binary to string and then back to binary is a smell for me. Better to do something like (pseudo-code): `result := MillisecondClockCount mod 7 + 1;`. – 500 - Internal Server Error Jan 18 '17 at 17:30
  • 1
    That is definitely not getting you hired. – user2357112 Jan 18 '17 at 17:30
  • 1
    Pseudo-random generators usually (always?) work by having an internal state, computing the next number from the internal state, then updating the internal state with the computed value. Time is not a good source of pseudo-randomness, but it is often used to provide an initial state for your PRNG generator. – biziclop Jan 18 '17 at 17:58
  • 1
    Might want to tag it with a language. – MSalters Jan 18 '17 at 18:39

3 Answers3

2

(Since the question seems to ask for analysis of issues in the code and not a solution, I am not providing one. )

The answer is unacceptable because:

  1. You need to wait for a second for each random number. Many applications need a few hundred at a time. (If the sleep is just for convenience, note that even a microsecond granularity will not yield true random numbers as the last microsecond will be monotonically increasing until 10us are reached. You may get more than a few calls done in a span of 10us and there will be a set of monotonically increasing pseudo-random numbers).

  2. Random numbers have uniform distribution. Each element should have the same probability in theory. In this case, you skew 1 more (twice the probability for 0, 1) and 7 more (thrice the probability for 7, 8, 9) compared to the others in the range 2-6.

Typically answers to this sort of a question will try to get a large range of numbers and distribute the ranges evenly from 1-7. For example, the above method would have worked fine if u had wanted randomness from 1-5 as 10 is evenly divisible by 5. Note that this will only solve (2) above.

For (1), there are other sources of randomness, such as /dev/random on a Linux OS.

user1952500
  • 6,611
  • 3
  • 24
  • 37
  • I put `sleep(1)` to see the output. In reality it is not required. – Rasmi Ranjan Nayak Jan 18 '17 at 18:33
  • In that case you are using the microsecond field, correct ? Then you may see that the numbers are not so random after all. – user1952500 Jan 18 '17 at 18:35
  • @RasmiRanjanNayak updated answer to explain what will happen with time. – user1952500 Jan 18 '17 at 18:37
  • 3
    There's also the question whether the digits of time are random at all. Computers don't keep track of time in decimal fractions of a second, so the decimal value obtained is the result of a conversion. This might introduce another bias. – MSalters Jan 18 '17 at 18:45
1

You haven't really specified the constraints of the problem you're trying to solve, but if it's from a collection of interview questions it seems likely that it might be something like this.

In any case, the answer shown would not be acceptable for the following reasons:

  1. The distribution of the results is not uniform, even if the samples you read from time.time() are uniform.

  2. The results from time.time() will probably not be uniform. The result depends on the time at which you make the call, and if your calls are not uniformly distributed in time then the results will probably not be uniformly distributed either. In the worst case, if you're trying to randomise an array on a very fast processor then you might complete the entire operation before the time changes, so the whole array would be filled with the same value. Or at least large chunks of it would be.

  3. The changes to the random value are highly predictable and can be inferred from the speed at which your program runs. In the very-fast-computer case you'll get a bunch of x followed by a bunch of x+1, but even if the computer is much slower or the clock is more precise, you're likely to get aliasing patterns which behave in a similarly predictable way.

  4. Since you take the time value in decimal, it's likely that the least significant digit doesn't visit all possible values uniformly. It's most likely a conversion from binary to some arbitrary number of decimal digits, and the distribution of the least significant digit can be quite uneven when that happens.

  5. The code should be much simpler. It's a complicated solution with many special cases, which reflects a piecemeal approach to the problem rather than an understanding of the relevant principles. An ideal solution would make the behaviour self-evident without having to consider each case individually.

The last one would probably end the interview, I'm afraid. Perhaps not if you could tell a good story about how you got there.

You need to understand the pigeonhole principle to begin to develop a solution. It looks like you're reducing the time to its least significant decimal digit for possible values 0 to 9. Legal results are 1 to 7. If you have seven pigeonholes and ten pigeons then you can start by putting your first seven pigeons into one hole each, but then you have three pigeons left. There's nowhere that you can put the remaining three pigeons (provided you only use whole pigeons) such that every hole has the same number of pigeons.

The problem is that if you pick a pigeon at random and ask what hole it's in, the answer is more likely to be a hole with two pigeons than a hole with one. This is what's called "non-uniform", and it causes all sorts of problems, depending on what you need your random numbers for.

You would either need to figure out how to ensure that all holes are filled equally, or you would have to come up with an explanation for why it doesn't matter.

Typically the "doesn't matter" answer is that each hole has either a million or a million and one pigeons in it, and for the scale of problem you're working with the bias would be undetectable.

Community
  • 1
  • 1
sh1
  • 4,324
  • 17
  • 30
0

Using the same general architecture you've created, I would do something like this:

import time
def generate_rand():
    ret = str(time.time()) # time in second like, 12345.1234
    ret = ret % 8 # will return pseudorandom numbers 0-7
    if ret == 0:
        return 1 # or you could also return the result of another call to generate_rand()
    return ret

while 1:
    print(generate_rand())
    time.sleep(1)
sloppypasta
  • 1,068
  • 9
  • 15