In order to understand how threads work in Python, I wrote the following simple function:
def sum_list(thelist:list, start:int, end:int):
s = 0
for i in range(start,end):
s += thelist[i]**3//10
return s
Then I created a list and tested how much time it takes to compute its sum:
LISTSIZE = 5000000
big_list = list(range(LISTSIZE))
start = time.perf_counter()
big_sum=sum_list(big_list, 0, LISTSIZE)
print(f"One thread: sum={big_sum}, time={time.perf_counter()-start} sec")
It took about 2 seconds.
Then I tried to partition the computation into threads, such that each thread computes the function on a subset of the list:
THREADCOUNT=4
SUBLISTSIZE = LISTSIZE//THREADCOUNT
start = time.perf_counter()
with concurrent.futures.ThreadPoolExecutor(THREADCOUNT) as executor:
futures = [executor.submit(sum_list, big_list, i*SUBLISTSIZE, (i+1)*SUBLISTSIZE) for i in range(THREADCOUNT)]
big_sum = 0
for res in concurrent.futures.as_completed(futures): # return each result as soon as it is completed:
big_sum += res.result()
print(f"{THREADCOUNT} threads: sum={big_sum}, time={time.perf_counter()-start} sec")
Since I have a 4-cores CPU, I expected it to run 4 times faster. But it did not: it ran in about 1.8 seconds on my Ubuntu machine (on my Windows machine, with 8 cores, it ran even slower than the single-thread version: about 2.2 seconds).
Is there a way to use ThreadPoolExecutor
(or another threads-based mechanism in Python) so that I can compute this function faster?