Edit: using a combination of generating the random numbers in a block and multiprocessing, I got 100 million trials done in 6 minutes on my latop, altogether about 12 times faster than the original code.
single process with block generation of random numbers
The following code runs 3x faster on my machine (15s vs 45s). The main change is to move all random number generation out of the main loop. If iters
is so large that you don't have the memory to do that, then you can run a nested loop and generate as large a block as you can afford and rinse and repeat -- I've put the code for this below following the edit to your question.
import numpy as np
count = 0
iters = 1000000
l=10
k=10
l0=l+k-1
t = np.random.choice([-1,1], size = l0 * iters)
v = np.random.choice([-1,1], size = l * iters)
for i in xrange(iters):
if (not np.convolve(v[(l*i):(l*(i+1))], t[(l0*i):(l0*(i+1))], 'valid').any()):
count += 1
print count
The other v minor change that improved the performance of your original code by about 2% was to move the calculation l+k-1
outside of the loop. Incidentally, I found that the way you deal with count
is quite efficient. So, for example, count += np.convolve(v[(l*i):(l*(i+1))], t[(l0*i):(l0*(i+1))], 'valid').any()
and then doing iters - count
at the end is slower. This is because the condition not...any()
is very rare, and the way you have it now you touch count very rarely.
To run 100 million times set N=100
in the program below. With the current value of N=4
the program took 1 minute to execute (with a count of 26). With l=12
, k=12
, and N=4
the program took just over a minute to execute (with a count of 4). So you should be looking at less than half an hour for 100 million.
import numpy as np, time
start = time.clock()
count = 0
iters = 1000000 # 1million
l=10
k=10
l0=l+k-1
N = 4 # number of millions
for n in range(N):
t = np.random.choice([-1,1], size=l0 * iters)
v = np.random.choice([-1,1], size = l * iters)
for i in xrange(iters):
if (not np.convolve(v[(l*i):(l*(i+1))], t[(l0*i):(l0*(i+1))], 'valid').any()):
count += 1
print (time.clock() - start)
print count
multiple processes
Edit: this is an "embarassingly parallel" problem, so you can run the simulations on multiple processors. An easy way to do this is using a pool of workers. Note however it's important to set the random seed in each process. Otherwise you risk having each process use the same set of random numbers (see here, I'm assuming this applies to numpy random as well as to the base random). The code is below.
import numpy as np, time
from multiprocessing import Pool
def countconvolve(N):
np.random.seed() # ensure seed is random
count = 0
iters = 1000000 # 1million
l=12
k=12
l0=l+k-1
for n in range(N):
t = np.random.choice([-1,1], size=l0 * iters)
v = np.random.choice([-1,1], size = l * iters)
for i in xrange(iters):
if (not np.convolve(v[(l*i):(l*(i+1))], t[(l0*i):(l0*(i+1))], 'valid').any()):
count += 1
return count
if __name__ == '__main__':
start = time.clock()
num_processes = 8
N = 13
pool = Pool(processes=num_processes)
res = pool.map(countconvolve, [N] * num_processes)
print res, sum(res)
print (time.clock() - start)
It ran 104 million simulations in 370 seconds, and produced the following output
[4, 9, 10, 6, 7, 8, 11, 9] 64
My laptop is a core-i7 with 8GB of memory, so with 8 cores I got a 4x speedup by parallel processing. I found the memory usage was about 160MB per process (with a higher peak). If you have fewer cores or less memory you would want to adjust the parameters in the program accordingly.
With @moarningsun's suggestion of constraining the array to have 1 byte per element, the memory usage dropped to 60MB per process.
t = np.random.choice(np.array([-1,1], dtype=np.int8), size=l0 * iters)
v = np.random.choice(np.array([-1,1], dtype=np.int8), size = l * iters)