0

How can I use multiple threads to increment same variable concurrently in parallel so that the total amount of time taken is reduced to that multipleth of the original synchronous process would take?

Example:

num = 0                     
def incrementer():               
    for i in range(100):    
        global num
        num += 1            
for i in range(100):        
    th=Thread(target=incrementer)
    th.start()              
num                         

The above code does give the intended result (10000), but the time taken is much greater than the synchronous approach:

In [114]: %%timeit                                                     
     ...: num = 0                                                      
     ...: def incrementer():                                                
     ...:     for i in range(100):                                     
     ...:         global num                                           
     ...:         num += 1                                             
     ...: for i in range(100):                                         
     ...:     th=Thread(target=incrementer)                                 
     ...:     th.start()                                               
25.3 ms ± 2.27 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [113]: %%timeit                                                     
     ...: num = 0                                                      
     ...: for i in range(10000):                                       
     ...:     num += 1                                                 
596 µs ± 84.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

I was expecting the multi-threading approach to take 1 percent of time taken by the synchronous approach...

How can I make the asynchronous approach take nth of time of the time taken by synchronous approach to complete for n threads, or is it plainly impossible?

Ξένη Γήινος
  • 2,181
  • 1
  • 9
  • 35
  • 1
    Does this answer your question? [python multi-threading slower than serial?](https://stackoverflow.com/questions/10789042/python-multi-threading-slower-than-serial) – Delgan Aug 05 '21 at 13:32
  • 1
    Python doesn't do true threading unless you use something like the `multiprocessing` package. Additionally, your benchmarking is including the cost of setting up the threads, which won't necessarily be trivial and I suspect that it dominating your result. As a side note, your code isn't thread-safe. Integer writes are atomic in Python, but an increment is two operations (read and then write) and can be interrupted in between. – Kemp Aug 05 '21 at 13:37
  • 1
    The runtime overhead of instantiating a new thread far outweighs the time taken for your function to execute –  Aug 05 '21 at 13:48

1 Answers1

1

Multi-threading perfomance boost isn't as linear as you might think.

Assuming you have a 4-core CPU (nc=4), going from single-thread (t=1) to 4 threads (t=4) should show you an decrease in overall computation time. You might expect a 75% decrease in computation time if you have a 4-core CPU, but in the real world, some work still needs to be done to keep the OS going.

Having more threads than you have physical cores would (t>nc), at best, give you the same performance as matching those values (t=nc), but in the real world, managing resources with t>>nc would probably actually run a bit slower.

Next, your approach is to use a single counter, updated by all threads, this adds complexity as you'd need a mutex/semaphore/etc. to ensure the counter updates function as expected with multiple parties reading/writing to it - the more parties, the harder this gets, and, you guessed it, yet another performance hit.

I'd recommend dividing your problem space into the number of threads (t=nc, if you were reading above) such that (e.g.) threadA processes 0-24, threadB: 25-49, threadC 50-74 and threadD: 75-99 - this way they don't need to talk to each other, as each is working on their own slice of the problem space.

tl;dr:

  • when multi-threading, there's rarely any advantage to go beyond t=nc, but functionally, you may even want to stop at t=(nc-1) to leave some CPU time for your OS/etc.
  • thread synchronization needs to be done carefully; try to slice the pie into t chunks before sending the jobs to the threads.
Adam Smooch
  • 1,167
  • 1
  • 12
  • 27