When using bruteforce with multithreading it slower than a single thread

Question

I want to list all the combinations of a base36 string from 0000 to zzzz.
When I run it with a single thread it works faster (~6-5 seconds) than with multithreads (~13-14 seconds).
I read here why using more threads can be slower than using less threads.
But I have 4 cores (8 logical processors) and I don't think this is the issue in my case.

Am I doing something wrong with my code ?
Maybe the join() function slows things down ?

Here is the my code:

import time
import threading

# https://codegolf.stackexchange.com/questions/169432/increment-base-36-strings?page=1&tab=votes#tab-top
def inc_base36(s):
   L,R=s[:-1],s[-1:];
   return s and[[L+chr(ord(R)+1),inc_base36(L)+'0'][R>'y'],L+'a'][R=='9']

def bruteforce(start_id, end_id):
 while start_id != end_id:
   start_id = inc_base36(start_id)

# Single thread
# --- 5.15600013733 seconds ---
start_time = time.time()
bruteforce('0000', 'zzzz')
print("--- %s seconds ---" % (time.time() - start_time))

# Two threads
# --- 13.603000164 seconds ---
t1 = threading.Thread(target=bruteforce, args = ('0000', 'hzzz')) # in decimal (0, 839807)
t2 = threading.Thread(target=bruteforce, args = ('i000', 'zzzz')) # in decimal (839808, 1679615)

start_time = time.time()
t1.start()
t2.start()
t1.join()
t2.join()
print("--- %s seconds ---" % (time.time() - start_time))

# Three threads
# --- 14.3159999847 seconds ---
t1 = threading.Thread(target=bruteforce, args = ('0000', 'bzzz')) # in decimal (0, 559871)
t2 = threading.Thread(target=bruteforce, args = ('c000', 'nzzz')) # in decimal (559872, 1119743)
t3 = threading.Thread(target=bruteforce, args = ('o000', 'zzzz')) # in decimal (1119744, 1679615)

start_time = time.time()
t1.start()
t2.start()
t3.start()
t1.join()
t2.join()
t3.join()
print("--- %s seconds ---" % (time.time() - start_time))

`join` cannot slow anything down. It just waits until the thread finishes. Why you have a `count=0`? It seems to be useless and I don't know if python is smart enough to selialize access to this variable. This could be a potential time-hole. — Poshi, Aug 26 '18 at 14:28
@Poshi it doesn't related to the `count=0`, I removed and checked again and received the same results. — E235, Aug 26 '18 at 17:21
Try making a loop saving into t[n] they try different n_max to see how the number of threads impacts run time. — Gilbert, Aug 26 '18 at 17:26

score 1 · Answer 1 · answered Aug 26 '18 at 17:26

1

Most Python implementations has a GIL (Global Intperpreter Lock) which allows execution of a one thread at a time. You should use Jython which doesn't have GIL or implement multiprocessing in your script.

answered Aug 26 '18 at 17:26

Yossi

11,778
2
53
66

I read about multiprocessing: https://stackoverflow.com/a/3046201/2153777 but it sounds that threads are more lightweight for what I need and spwanning processes are slower. Anyway I will try it – E235 Aug 26 '18 at 18:36
@E235 If you use a `multiprocessing.Pool`, the worker processes are re-used rather than restarted for each calculation. And how "leightweight" creating a new process is depends very much on the operating system. For systems that use `fork()` to create new processes and use copy-on-write memory management, creating a new process is not all that expensive. – Roland Smith Aug 26 '18 at 20:44
Threads are indeed lighter, but python limitations won't let you to leverage more than 1 of your cpu cores. – Yossi Aug 27 '18 at 17:15

Roland Smith · Answer 2 · 2018-08-26T18:33:25.407

0

While Yossi's answer is correct, a faster solution could be to use the tools from the standard library.

In this case, itertools.product. If I've interpreted your question correctly, you could do:

In [1]: from itertools import product

In [2]: base36 = "0123456789abcdefghijklmnopqrstuvwxyz"

In [3]: res = [''.join(p) for p in product(base36, repeat=4)]

In [4]: res[0], res[-1]
Out[4]: ('0000', 'zzzz')

Let's see how fast this is:

In [5]: %timeit res = [''.join(p) for p in product(base36, repeat=4)]
800 ms ± 1.24 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

As you can see, this is much faster. The timing was done using CPython 3.6 on an old core2 CPU. The itertools module in CPython is written in C, which is probably a significant part of the reason why it is faster.

And the result seems to be complete:

In [6]: len(res)
Out[6]: 1679616

In [7]: 36**4
Out[7]: 1679616

edited Aug 26 '18 at 18:33

answered Aug 26 '18 at 18:27

Roland Smith

42,427
3
64
94

Thanks, but this is good till 4 letters. If I want to run it over 5 letters I get `MemoryError`. This is why I checked it on threads and gave "easy" example with 4 letters. But when I will need to run it over 5 letters it 36^5 = ~60 millions combinations and when running it with threads I can give each thread group of lines and divide the memory between the threads. – E235 Aug 26 '18 at 18:33
@E235 It doesn't matter in how many pieces you divide a list of 36⁵. The total length of all lists will remain 36⁵. – Roland Smith Aug 26 '18 at 18:44
Right, but if I can run **parallel** from `00000` to `hhhha` and from `hhhhb` to `zzzzz` it should be faster than just running from `00000` to `zzzzz`. – E235 Aug 26 '18 at 18:49
@E235 Not necessarily. In CPython, running in parallel means using `multiprocessing`. That means that at the end of the calculations, the results have to be transferred *back to the parent process*, since they don't share memory. So every combination exists in at least *two* processes (which basically doubles the amount of memory required). Second, transmitting the results between processes (IPC) also takes time and resources. For large amounts of data this can overwhelm the time needed for the calculations. – Roland Smith Aug 26 '18 at 20:40

When using bruteforce with multithreading it slower than a single thread

2 Answers2