14

I'm in the process of working on programming project that involves some pretty extensive Monte Carlo simulation in Python, and as such the generation of a tremendous number of random numbers. Very nearly all of them, if not all of them, will be able to be generated by Python's built in random module.

I'm something of a coding newbie, and unfamiliar with efficient and inefficient ways to do things. Is it faster to generate say, all the random numbers as a list, and then iterate through that list, or generate a new random number each time a function is called, which will be in a very large loop?

Or some other, undoubtedly more clever method?

Fomite
  • 2,213
  • 7
  • 30
  • 46
  • If you are using Linux, reading a bunch of numbers `/dev/random` into a file, and using thereafter, would be faster and the numbers will be of potentially better quality. Not that Python's random generator is bad or anything. – ktdrv Nov 02 '11 at 23:26
  • 2
    @kaloyan: If the OP uses `random.SystemRandom` instead of the `random` module, then that would utilized `/dev/urandom` on *nix and `CryptGenRandom` on Windows. Which is more than sufficient. – rossipedia Nov 03 '11 at 00:08
  • @BryanRoss: True, unless the OP also wants reproducibility. – ktdrv Nov 03 '11 at 00:37
  • Whole the speed of using the SystemRandom might be appealing, I'll probably end of favoring the standard random module for reproducibility. The quality of the RNG only needs to be "Adequately wanders about the distribution" rather than any cryptographic concerns. – Fomite Nov 03 '11 at 00:51
  • 4
    A much faster method is to use NumPy's random.randint method. Generating an array of 1 million random integers in Numpy (on my rather old machine) took 45ms, versus 5.5s in a Python loop. – joshayers Nov 03 '11 at 04:21
  • @bellamyj Does NumPy have RNGs besides the integer one? Because transforming them into another distribution I need could take awhile. – Fomite Nov 03 '11 at 04:33
  • 1
    How often are you going to use them? Or - how extensive (slow) is your Monte Carlo simulation? If each run takes 24 hours, then it doesn't really matter if generating 1M numbers, takes milliseconds or seconds. Only optimize something if it matters... – John C Nov 03 '11 at 15:16
  • @JohnC The simulation itself is rather fast - on the order of seconds or minutes, rather than hours or days (I occasionally have those too). Your advice about optimizing is appreciated, and normally what I do, but since I haven't implemented this at all, and the coding is roughly equivalent in terms of writing, I figured why not go with the faster version? – Fomite Nov 03 '11 at 16:54
  • @EpiGrad Yes numpy.random has methods to output random variables drawn from many different distributions. – joshayers Dec 06 '11 at 05:38
  • In all this discussion about differently efficient ways to generate random numbers, I do wonder - how do you verify that different ways generate equally random numbers? Some statistical analysis? – Frerich Raabe Oct 08 '13 at 21:15
  • 1
    @FrerichRaabe I know for my particular application, I wouldn't bother - I'm willing to accept a fairly crude randomness. – Fomite Oct 08 '13 at 21:58
  • @EpiGrad Meta-+1 for your comment for answering after I commented on your question about two years after it was asked. :-) – Frerich Raabe Oct 09 '13 at 07:17

5 Answers5

4

Python builtin random module, e.g. random.random(), random.randint(), (some distributions also available, you probably want gaussian) does about 300K samples/s.

Since you are doing numerical computation, you probably use numpy anyway, that offers better performance if you cook random number one array at a time instead of one number at a time and wider choice of distributions. 60K/s * 1024 (array length), that's ~60M samples/s.

You can also read /dev/urandom on Linux and OSX. my hw/sw (osx laptop) manages ~10MB/s.

Surely there must be faster ways to generate random numbers en masse, e.g.:

from Crypto.Cipher import AES
from Crypto.Util import Counter
import secrets

aes = AES.new(secrets.token_bytes(16), AES.MODE_CTR, secrets.token_bytes(16), counter=Counter.new(128))
data = "0" * 2 ** 20
with open("filler.bin", "wb") as f:
    while True:
        f.write(aes.encrypt(data))

This generates 200MB/s on a single core of i5-4670K

Common ciphers like aes and blowfish manage 112MB/s and 70MB/s on my stack. Furthermore modern processors make aes even faster up to some 700MB/s see this link to test runs on few hardware combinations. (edit: link broken). You could use weaker ECB mode, provided you feed distinct inputs into it, and achieve up to 3GB/s.

Stream cipher are better suited for the task, e.g. RC4 tops out at 300MB/s on my hardware, you may get best results from most popular ciphers as more effort was spent optimising those both and software.

Dima Tisnek
  • 11,241
  • 4
  • 68
  • 120
4
import random
for x in (random.randint(0,80) for x in xrange(1000*1000)):
    print x

The code between parentheses will only generate one item at a time, so it's memory safe.

ychaouche
  • 4,922
  • 2
  • 44
  • 52
4

Generate a random number each time. Since the inner workings of the loop only care about a single random number, generate and use it inside the loop.

Example:

# do this:
import random

for x in xrange(SOMEVERYLARGENUMBER):
    n = random.randint(1,1000) # whatever your range of random numbers is
    # Do stuff with n

# don't do this:
import random

# This list comprehension generates random numbers in a list
numbers = [random.randint(1,1000) for x in xrange(SOMEVERYLARGENUMBER)]

for n in numbers:
    # Do stuff with n

Obviously, in practical terms it really doesn't matter, unless you're dealing with billions and billions of iterations, but why bother generating all those numbers if you're only going to be using one at a time?

rossipedia
  • 56,800
  • 10
  • 90
  • 93
  • 7
    Important sidenote: If you generate billions of numbers and store them in a list, you will eventually run out of memory. If you generate those numbers on demand, this is not an issue. So for large amounts of numbers you **have to** generate them when needed. – hochl Nov 02 '11 at 23:28
  • 1
    random.randint() only makes some 300K numbers/s on my hw. that's pretty slow. – Dima Tisnek Oct 08 '13 at 18:25
  • 2
    You can do the `don't do this` approach just fine if you just use a generator instead of a list, i.e. change `[random.randint..]` to `(random.randint..)`. – Frerich Raabe Oct 08 '13 at 21:14
  • Yeah, but it wouldn't really be as readable. Remember: ["Readability counts."](http://www.python.org/dev/peps/pep-0020/) – rossipedia Oct 09 '13 at 19:13
2

Code to generate 10M random numbers efficiently and faster:

import random
l=10000000
listrandom=[]
for i in range (l):
    value=random.randint(0,l)
    listrandom.append(value)
print listrandom

Time taken included the I/O time lagged in printing on screen:

real    0m27.116s
user    0m24.391s
sys 0m0.819s
shashankS
  • 1,043
  • 1
  • 11
  • 21
0

Using Numpy -

import numpy as np
np.random.randint(low="put the range like 1 to 100, so put '1' in 
low",high="100",size="1000000")