2

I want to generate random sample without replacement for N times, like following:

import numpy as np

sample = np.zeros([100000, 4], int)
for i in range(100000):
    sample[i] = np.random.choice(128, 4, replace=False)

If the iterations become very large, the overall sampling will be time consuming. Is there any way to speed up this sampling?

EmbraceZXS
  • 53
  • 1
  • 5
  • Lock at this article: https://medium.freecodecamp.org/how-to-get-embarrassingly-fast-random-subset-sampling-with-python-da9b27d494d9 – iRhonin Apr 27 '19 at 12:57
  • so, you want 2D array , one of its dimensions is large while the other is small – Yasin Yousif Apr 27 '19 at 13:01
  • So you want to sample without replacement in one dimension, but with replacement in the other (or to do that sample repeatedly)? – hpaulj Apr 27 '19 at 14:03

3 Answers3

1

Your method

In [16]: sample = np.zeros([100000, 4], int)

In [17]: %timeit for i in range(100000):sample[i] = np.random.choice(128, 4, rep
    ...: lace=False)
1 loop, best of 3: 2.5 s per loop

While you can write:

In [149]: %timeit d=np.random.choice(128,100000);sample1=np.array([(d+x)%128 for x in np.random.choice(128,4)])
The slowest run took 4.63 times longer than the fastest. This could mean that an intermediate result is being cached.
100 loops, best of 3: 4.11 ms per loop

This is way faster on my machine

This is less random maybe, but this depends on your application. after all for loop is very slow in vanilla python. you maybe interested in Cython, or Numba

Community
  • 1
  • 1
Yasin Yousif
  • 969
  • 7
  • 23
0

This will give you a random int range(0,128) in the shape (100000,4)

np.random.randint(128, size=(100000,4))
Nic Wanavit
  • 2,363
  • 5
  • 19
  • 31
0

Use random.sample instead of np.random.choice

In [16]: import time
    ...: start_time = time.time()
    ...: sample = np.zeros([100000, 4], int)
    ...: for i in range(100000):
    ...:     sample[i] = random.sample(range(128), 4)
    ...: print("--- %s seconds ---" % (time.time() - start_time))
    ...: 
--- 0.7096474170684814 seconds ---

In [17]: import time
    ...: start_time = time.time()
    ...: sample = np.zeros([100000, 4], int)
    ...: for i in range(100000):
    ...:     sample[i] = np.random.choice(128, 4, replace=False)
    ...: print("--- %s seconds ---" % (time.time() - start_time))
    ...: 
--- 5.2036824226379395 seconds ---
iRhonin
  • 383
  • 3
  • 14
  • 3
    Oh man. Use `timeit` for timing runtime, but `np.random.choice` is not meant to be called in a loop - you can pass a `shape` to it and call it once. This timing doesn't really make much sense. Anything can affect your timings if you run it once – roganjosh Apr 27 '19 at 13:05