I developed simple program to solve eight queens problem. Now I would like to do some more testing with different meta-parameters so I would like to make it fast. I went through a few iterations of profiling and was able to cut runtime significantly but I reached the point where I believe only parts of computations concurrently could make it faster. I tried to use multiprocessing
and concurrent.futures
modules but it did not improve runtime a lot and in some cases even slowed down execution. That is to just give some context.
I was able to come up with similar code structure where sequential version beats concurrent.
import numpy as np
import concurrent.futures
import math
import time
import multiprocessing
def is_prime(n):
if n % 2 == 0:
return False
sqrt_n = int(math.floor(math.sqrt(n)))
for i in range(3, sqrt_n + 1, 2):
if n % i == 0:
return False
return True
def generate_data(seed):
np.random.seed(seed)
numbers = []
for _ in range(5000):
nbr = np.random.randint(50000, 100000)
numbers.append(nbr)
return numbers
def run_test_concurrent(numbers):
print("Concurrent test")
start_tm = time.time()
chunk = len(numbers)//3
primes = None
with concurrent.futures.ProcessPoolExecutor(max_workers=3) as pool:
primes = list(pool.map(is_prime, numbers, chunksize=chunk))
print("Time: {:.6f}".format(time.time() - start_tm))
print("Number of primes: {}\n".format(np.sum(primes)))
def run_test_sequential(numbers):
print("Sequential test")
start_tm = time.time()
primes = [is_prime(nbr) for nbr in numbers]
print("Time: {:.6f}".format(time.time() - start_tm))
print("Number of primes: {}\n".format(np.sum(primes)))
def run_test_multiprocessing(numbers):
print("Multiprocessing test")
start_tm = time.time()
chunk = len(numbers)//3
primes = None
with multiprocessing.Pool(processes=3) as pool:
primes = list(pool.map(is_prime, numbers, chunksize=chunk))
print("Time: {:.6f}".format(time.time() - start_tm))
print("Number of primes: {}\n".format(np.sum(primes)))
def main():
nbr_trails = 5
for trail in range(nbr_trails):
numbers = generate_data(trail*10)
run_test_concurrent(numbers)
run_test_sequential(numbers)
run_test_multiprocessing(numbers)
print("--\n")
if __name__ == '__main__':
main()
When I run it on my machine - Windows 7, Intel Core i5 with four cores I got the following output:
Concurrent test
Time: 2.006006
Number of primes: 431
Sequential test
Time: 0.010000
Number of primes: 431
Multiprocessing test
Time: 1.412003
Number of primes: 431
--
Concurrent test
Time: 1.302003
Number of primes: 447
Sequential test
Time: 0.010000
Number of primes: 447
Multiprocessing test
Time: 1.252003
Number of primes: 447
--
Concurrent test
Time: 1.280002
Number of primes: 446
Sequential test
Time: 0.010000
Number of primes: 446
Multiprocessing test
Time: 1.250002
Number of primes: 446
--
Concurrent test
Time: 1.260002
Number of primes: 446
Sequential test
Time: 0.010000
Number of primes: 446
Multiprocessing test
Time: 1.250002
Number of primes: 446
--
Concurrent test
Time: 1.282003
Number of primes: 473
Sequential test
Time: 0.010000
Number of primes: 473
Multiprocessing test
Time: 1.260002
Number of primes: 473
--
The question that I have is whether I can make it somehow faster by running it concurrently on Windows with Python 3.6.4 |Anaconda, Inc.|
. I read here on SO (Why is creating a new process more expensive on Windows than Linux?) that creating new processes on Windows is expensive. Is there anything that can be done to speed things up? Am I missing something obvious?
I also tried to create Pool
only once but it did not seem to help a lot.
Edit:
The original code structure looks more or less like:
My code is structure more or less like this:
class Foo(object):
def g() -> int:
# function performing simple calculations
# single function call is fast (~500 ms)
pass
def run(self):
nbr_processes = multiprocessing.cpu_count() - 1
with multiprocessing.Pool(processes=nbr_processes) as pool:
foos = get_initial_foos()
solution_found = False
while not solution_found:
# one iteration
chunk = len(foos)//nbr_processes
vals = list(pool.map(Foo.g, foos, chunksize=chunk))
foos = modify_foos()
with foos
having 1000
elements. It is not possible to tell in advance how quickly algorithm converge and how many iterations are executed, possibly thousands.