I am playing around with concurrent.future in python as a means to understand a few simple implementations that use multiprocessing. However, I've come across a very unexpected result. Before I begin, here are my system details:
Take the following geometric series that computes the mean of n
numbers:
With this idea in mind, I create a function that computes the mean of the integers between a low bound a
(inclusive) and a high bound b
(exclusive). I then run a test with and without multiprocessing on a range of 500 million integers:
import time
import concurrent.futures
def mean(a, b):
total_sum = 0
for next_int in range(a, b):
total_sum += next_int
return total_sum / (b - a)
if __name__ == '__main__':
n = 500000000 # 500 Million
wall_time = time.time()
base_ans = mean(0, n) # From 0 to n-1.
print("Single Thread Time: " + str(time.time() - wall_time) + " sec.")
work = [(0, int(n/2)), (int(n/2), n)]
num_workers = 2 # One process per core!
test_ans = 0
wall_time = time.time()
with concurrent.futures.ProcessPoolExecutor(max_workers=num_workers) as executor:
future_tasks = {executor.submit(mean, job[0], job[1]): job for job in work}
for future in concurrent.futures.as_completed(future_tasks):
test_ans += future.result()
print("Multiprocessing Time: " + str(time.time() - wall_time) + " sec.")
print(str(base_ans) + " == " + str(test_ans / num_workers) + " => " + str(base_ans == (test_ans / num_workers)))
The following produces the following output:
Single Thread Time: 41.0769419670105 sec. # CPU Utilization ≈ 35% (from task manager)
Multiprocessing Time: 24.71605634689331 sec. # CPU Utilization ≈ 70% (from task manager)
As we can clearly see, a major speed up was observed (roughly 1.66x). However, if I create 4 workers, instead of 2, I get an even greater speed up:
work = [(0, int(n/4)), (int(n/4), int(n/2)), (int(n/2), int(3*n/4)), (int(3*n/4), n)]
num_workers = 4
# ...
Single Thread Time: 41.51883292198181 sec. # CPU Utilization ≈ 35% (from task manager)
Multiprocessing Time: 18.18532919883728 sec. # CPU Utilization = 100% (from task manager)
An even greater speed up can be seen here (roughly 2.28x) and it is even consist over many runs!
- Since only two processes can simultaneously run on this two (physical) core system, is the efficiency of Window's scheduler the reason for this continued speed up?
- How can I choose a
max_worker
number that provides the fastest runtime? How many more processes should I add past the physical core count? - And lastly, does adding more processes past the physical core count affect the threads (in multithreading) within each process from running efficiency?