7

I'm new to Python. While reading, please mention any other suggestions regarding ways to improve my Python code.

Question: How do I generate a 8xN dimensional array in Python containing random numbers? The constraint is that each column of this array must contain 8 draws without replacement from the integer set [1,8]. More specifically, when N = 10, I want something like this.

[[ 6.  2.  3.  4.  7.  5.  5.  7.  8.  4.]
 [ 1.  4.  5.  5.  4.  4.  8.  5.  7.  5.]
 [ 7.  3.  8.  8.  3.  8.  7.  3.  6.  7.]
 [ 3.  6.  7.  1.  5.  6.  2.  1.  5.  1.]
 [ 8.  1.  4.  3.  8.  2.  3.  4.  3.  3.]
 [ 5.  8.  1.  7.  1.  3.  6.  8.  1.  6.]
 [ 4.  5.  2.  6.  2.  1.  1.  6.  4.  2.]
 [ 2.  7.  6.  2.  6.  7.  4.  2.  2.  8.]]

To do this I use the following approach:

import numpy.random
import numpy
def rand_M(N):
    M = numpy.zeros(shape = (8, N))
    for i in range (0, N):
        M[:, i] = numpy.random.choice(8, size = 8, replace = False) + 1 
    return M

In practice N will be ~1e7. The algorithm above is O(n) in time and it takes roughly .38 secs when N=1e3. The time therefore when N = 1e7 is ~1hr (i.e. 3800 secs). There has to be a much more efficient way.

Timing the function

from timeit import Timer 
t = Timer(lambda: rand_M(1000))
print(t.timeit(5))
0.3863314103162543
Divakar
  • 218,885
  • 19
  • 262
  • 358
Jacob H
  • 4,317
  • 2
  • 32
  • 39
  • I would refer to this question, seems to be what you need http://stackoverflow.com/questions/2709818/fastest-way-to-generate-1-000-000-random-numbers-in-python – Eric wright Aug 12 '15 at 03:53
  • @Ericwright thanks this is helpful thought I'm not sure it answers my question. My constraint is much different. Specifically, I have to make draw without replacement. This post is a good start though so thanks! – Jacob H Aug 12 '15 at 07:01

4 Answers4

9

Create a random array of specified shape and then sort along the axis where you want to keep the limits, thus giving us a vectorized and very efficient solution. This would be based on this smart answer to MATLAB randomly permuting columns differently. Here's the implementation -

Sample run -

In [122]: N = 10

In [123]: np.argsort(np.random.rand(8,N),axis=0)+1
Out[123]: 
array([[7, 3, 5, 1, 1, 5, 2, 4, 1, 4],
       [8, 4, 3, 2, 2, 8, 5, 5, 6, 2],
       [1, 2, 4, 6, 5, 4, 4, 3, 4, 7],
       [5, 6, 2, 5, 8, 2, 7, 8, 5, 8],
       [2, 8, 6, 3, 4, 7, 1, 1, 2, 6],
       [6, 7, 7, 8, 6, 6, 3, 2, 7, 3],
       [4, 1, 1, 4, 3, 3, 8, 6, 8, 1],
       [3, 5, 8, 7, 7, 1, 6, 7, 3, 5]], dtype=int64)

Runtime tests -

In [124]: def sortbased_rand8(N):
     ...:     return np.argsort(np.random.rand(8,N),axis=0)+1
     ...: 
     ...: def rand_M(N):
     ...:     M = np.zeros(shape = (8, N))
     ...:     for i in range (0, N):
     ...:         M[:, i] = np.random.choice(8, size = 8, replace = False) + 1 
     ...:     return M
     ...: 

In [125]: N = 5000

In [126]: %timeit sortbased_rand8(N)
100 loops, best of 3: 1.95 ms per loop

In [127]: %timeit rand_M(N)
1 loops, best of 3: 233 ms per loop

Thus, awaits a 120x speedup!

Community
  • 1
  • 1
Divakar
  • 218,885
  • 19
  • 262
  • 358
  • Beware that there's a non-zero probability of collisions using this method: i.e. what if there are duplicate numbers in the output of `np.random.rand`? Also, to generate an MxN matrix like this has O(M log(M) N) time complexity, rather than the optimal O(M N). See https://en.wikipedia.org/wiki/Shuffling#Shuffling_algorithms – Nzbuu Dec 03 '20 at 12:57
1

How about shuffling, that is to say, permuting?

import random
import numpy
from timeit import Timer 

def B_rand_M(N):
    a = numpy.arange(1,9)
    M = numpy.zeros(shape = (8, N))
    for i in range (0, N):
        M[:, i] = numpy.random.permutation(a)
    return M

# your original implementation
def J_rand_M(N):
    M = numpy.zeros(shape = (8, N))
    for i in range (0, N):
        M[:, i] = numpy.random.choice(8, size = 8, replace = False) + 1 
    return M

some timings:

def compare(N):
    for f in (J_rand_M, B_rand_M):
        t = Timer(lambda: f(N)).timeit(6)
        print 'time for %s(%s): %.6f' % (f.__name__, N, t)

for i in range(6):
    print 'N = 10^%s' % i
    compare(10**i)
    print

gives

N = 10^0
time for J_rand_M(1): 0.001199
time for B_rand_M(1): 0.000080

N = 10^1
time for J_rand_M(10): 0.001112
time for B_rand_M(10): 0.000335

N = 10^2
time for J_rand_M(100): 0.011118
time for B_rand_M(100): 0.003022

N = 10^3
time for J_rand_M(1000): 0.110887
time for B_rand_M(1000): 0.030528

N = 10^4
time for J_rand_M(10000): 1.100540
time for B_rand_M(10000): 0.304696

N = 10^5
time for J_rand_M(100000): 11.151576
time for B_rand_M(100000): 3.049474
Béla
  • 284
  • 1
  • 2
  • 8
  • thanks, however, on my computer for large N your approach is slower. I suspect this is due to the fact that you do not pre-allocation space in memory for the `M`. Additionally the M you return is not an array but a list. I do, however, like your permuations idea. Thanks! – Jacob H Aug 12 '15 at 06:20
  • @JacobH see edited answer, I added pre-allocation and made it return a numpy array - I'm on a different machine now but it still seems a bit faster than your code. – Béla Aug 12 '15 at 12:09
  • Thanks, this looks great! – Jacob H Aug 12 '15 at 18:10
0

Just a comment on your runtime analysis of the problem - my intuition is that O(n) is the best possible runtime you can possibly obtain when generating O(n) truly random numbers.

Have you tried actually running your code with n = 10 million? Your assumption that the runtime will scale by 1000 when the input grows by a factor of 1000 may not be true in practice, as there is usually a constant term when executing any program (loading libraries, etc.), which may be significant depending on the problem.

That being said, it looks like the question linked by Eric Wright does a very thorough job and can easily be adapted to fit your question.

Community
  • 1
  • 1
Jacob Ritchie
  • 1,261
  • 11
  • 22
  • yep your right, I will not be able to get a better algorithm than O(n) complexity. However, I'm however, looking for a faster approach – Jacob H Aug 12 '15 at 04:15
0

Use below code for your array generation

import numpy as np
N=1e7 # THe value you want to have
np.random.randint(1,high=8,size=(8,N))

Hope this helps, it will surely not going to take that much time.

Shrey
  • 1,242
  • 1
  • 13
  • 27