0

I am using two different methods of trying to generate a bootstrap sample

np.random.seed(335)
y=np.random.normal(0,1,5)
b=np.empty(len(y)) #initializes an empty vector
for j in range(len(y)):
    a = np.random.randint(1,len(y)) #Draws a random integer from 1 to n, where n is our sample size
    b[j] = y[a-1] #indicies in python start at zero, the worst part of Python in my opinion
c = np.random.choice(y, size=5)
print(b)
print(c)

and for my output I get different results

[1.04749432 1.71963433 1.71963433 1.71963433 1.71963433]
[-0.25224454 -0.25224454  0.46604474  1.71963433  0.46604474]

I think the answer has something to do with the random number generator, but I'm confused as to the exact reason.

Peter O.
  • 32,158
  • 14
  • 82
  • 96
Greg
  • 19
  • 1
  • 1
  • 3

1 Answers1

1

This comes down to the use of different algorithms for randomized selection. There are numerous equivalent ways to select items at random with replacement using a pseudorandom generator (or to generate random variates from any other distribution). In particular, the algorithm for numpy.random.choice need not make use of numpy.random.randint in theory. What matters is that these equivalent ways should produce the same distribution of random variates. In the case of NumPy, look at NumPy's source code.

Another, less important, reason for different results is that the two different selection procedures (randint and choice) produce pseudorandom numbers themselves, which can differ from each other because the selection procedures didn't begin with the same seed (more precisely, the same sequence of pseudorandom numbers). If we set the seed to the same value before beginning each procedure:

np.random.seed(335)
y=np.random.normal(0,1,5)
b=np.empty(len(y))
np.random.seed(999999)  # Seed selection procedure 1
for j in range(len(y)):
    a = np.random.randint(1,len(y))
    b[j] = y[a-1]
np.random.seed(999999)  # Seed selection procedure 2
c = np.random.choice(y, size=5)
print(b)
print(c)

then each procedure will begin with the same pseudorandom numbers. But even so, the two procedures may use different algorithms for random selection, and these differences may still lead to different results.

(However, numpy.random.* functions, such as randint and choice, have become legacy functions as of NumPy 1.17, and their algorithms are expected to remain as they are for backward compatibility reasons. That version didn't deprecate any numpy.random.* functions, however, so they are still available for the time being. See also this question. In newer applications you should make use of the new system introduced in version 1.17, including numpy.random.Generator, if you have that version or later. One advantage of the new system is that the application relies less on global state.)

Peter O.
  • 32,158
  • 14
  • 82
  • 96