My use case is a bit specific. I want to sample 2 items without replacement from a list/array (of 50, or 100 elements). So I don't have to worry about arrays of sizes of 10^4 or 10^5 or multidimensional data.
I want to know
- Which one,
numpy.random.choice()
ornumpy.random.shuffle()
is faster for this purpose, and why? - If they both produce random samples of "good quality"? That is, are both generating good random samples for my purpose, or does one produce less random samples? (Just a sanity check to make sure I am not overlooking something regarding the source codes of these functions).
For Question 1, I tried timing both functions (code below), and the shuffle method seems to about 5-6 times faster. Any insight you can give about this is most welcome. If there are faster ways to achieve my purpose, I will be glad to hear about them (I had looked at the options of python random
module, but the fastest method from my testing was using np.random.shuffle()
).
def shuffler(size, num_samples):
items = list(range(size))
np.random.shuffle(items)
return items[:num_samples]
def chooser(size, num_samples):
return np.random.choice(size, num_samples, replace=False)
%timeit shuffler(50, 2)
#> 1.84 µs ± 17.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit chooser(50, 2)
#> 13 µs ± 1.09 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
You may think it's already optimized and I am wasting time trying to save pennies. But np.random.choice()
is called 5000000 times in my code and takes about 8% of my runtime. It is being used in a loop to obtain 2 random samples from the population for each iteration.
Pseudocode:
for t in range(5000000):
# Random sample of 2 from the population without replacement.
If there is a smarter implementations for my requirement, I am open to suggestions.
PS: I am aware that shuffle
performs in place operation, but as I just require the indices of the two random elements I do not essentially have to perform it on my original array. There are other questions that compares the two functions from python random
module. But I require 2 samples without replacement.