Why threading increase processing time?

Question

I was working on multitasking a basic 2-D DLA simulation. Diffusion Limited Aggregation (DLA) is when you have particles performing a random walk and aggregate when they touch the current aggregate.

In the simulation, I have 10.000 particles walking to a random direction at each step. I use a pool of worker and a queue to feed them. I feed them with a list of particles and the worker perform the method .updatePositionAndggregate() on each particle.

If I have one worker, I feed it with a list of 10.000 particles, if I have two workers, i feed them with a list of 5.000 particles each, if I have 3 workers, I feed them with a list of 3.333 particles each, etc and etc.

I show you some code for the worker now

class Worker(Thread):
    """
    The worker class is here to process a list of particles and try to aggregate
    them.
    """

    def __init__(self, name, particles):
        """
        Initialize the worker and its events.
        """
        Thread.__init__(self, name = name)
        self.daemon = True
        self.particles = particles
        self.start()

    def run(self):
        """
        The worker is started just after its creation and wait to be feed with a
        list of particles in order to process them.
        """

        while True:

            particles = self.particles.get()
            # print self.name + ': wake up with ' + str(len(self.particles)) + ' particles' + '\n'

            # Processing the particles that has been feed.
            for particle in particles:
                particle.updatePositionAndAggregate()

            self.particles.task_done()
            # print self.name + ': is done' + '\n'

And in the main thread:

# Create the workers.
workerQueue = Queue(num_threads)
for i in range(0, num_threads):
    Worker("worker_" + str(i), workerQueue)

# We run the simulation until all the particle has been created
while some_condition():

    # Feed all the workers.
    startWorker = datetime.datetime.now()
    for i in range(0, num_threads):
        j = i * len(particles) / num_threads
        k = (i + 1) * len(particles) / num_threads

        # Feeding the worker thread.
        # print "main: feeding " + worker.name + ' ' + str(len(worker.particles)) + ' particles\n'
        workerQueue.put(particles[j:k])


    # Wait for all the workers
    workerQueue.join()

    workerDurations.append((datetime.datetime.now() - startWorker).total_seconds())
    print sum(workerDurations) / len(workerDurations)

So, I print the average time in waiting the workers to terminate their tasks. I did some experiment with different thread number.

| num threads | average workers duration (s.) |
|-------------|-------------------------------|
| 1           | 0.147835636364                |
| 2           | 0.228585818182                |
| 3           | 0.258296454545                |
| 10          | 0.294294636364                |

I really wonder why adding workers increase the processing time, I thought that at least having 2 worker would decrease the processing time, but it dramatically increases from .14s. to 0.23s. Can you explain me why ?

EDIT: So, explanation is Python threading implementation, is there a way so I can have real multitasking ?

exact the only reason i don't like python. very poor mm and thread model. — Jason Hu, Mar 17 '15 at 17:34
It is worth noting that some parallelism can be achieved by using multiprocessing. The number of threads is then limited by the number of processors/cores you have available (2 or 4 on most PCs). — Fran Borcic, Mar 17 '15 at 18:41

score 4 · Accepted Answer · edited Jun 20 '20 at 09:12

This is happening because threads don't execute at the same time as Python can execute only one thread at a time due to GIL (Global Interpreter Lock).

When you spawn a new thread, everything freezes except this thread. When it stops the other one is executed. Spawning threads needs lots of time.

Friendly speaking, the code doesn't matter at all as any code using 100 threads is SLOWER than code using 10 threads in Python (if more threads means more efficiency and more speed, which is not always true).

Here's an exact quote from the Python docs:

CPython implementation detail:

In CPython, due to the Global Interpreter Lock, only one thread can execute Python code at once (even though certain performance-oriented libraries might overcome this limitation). If you want your application to make better use of the computational resources of multi-core machines, you are advised to use multiprocessing or concurrent.futures.ProcessPoolExecutor. However, threading is still an appropriate model if you want to run multiple I/O-bound tasks simultaneously.

Wikipedia about GIL

StackOverflow about GIL

score 2 · Answer 2 · answered Mar 17 '15 at 17:30

2

Threads in python (at least in 2.7) are NOT executed simultaneously because of GIL: https://wiki.python.org/moin/GlobalInterpreterLock - they run in single process and share CPU, therefore you can't use threads for speeding your computation up.

If you want to use parallel computation to speed up your calculation (at least in python2.7), use processes - package multiprocessing.

answered Mar 17 '15 at 17:30

Jan Spurny

5,219
1
33
47

I tried with multiprocessing, but the overhead added for managing the process and the shared memory object is only worth for a lot of particles – Baptiste Pernet Mar 17 '15 at 20:36
@BaptistePernet that's because you need to change much more than one library - parallel computing is totally different and usually you need to change a way of thinking - one of really good approaches is to rewrite your problem so it can be used in `map-reduce`. Then you get most from parallel architecture. Using shared states just leads to more and more problems. I truly believe that shared states should be avoided in most cases. Also, to really gain from parallel processing, you usually need **a lot** of CPUs, because, as you've probably noticed, there is **always** an overhead. – Jan Spurny Mar 17 '15 at 22:10

Christian Groleau · Answer 3 · 2015-03-18T15:04:46.443

2

This is due to Python's global interpreter lock. Unfortunately, with the GIL in Python threads will block I/O and as such will never exceed usage of 1 CPU core. Have a look here to get you started on understanding the GIL: https://wiki.python.org/moin/GlobalInterpreterLock

Check your running processes (Task Manager in Windows, for example) and will notice that only one core is being utilized by your Python application.

I would suggest looking at multi-processing in Python, which is not hindered by the GIL: https://docs.python.org/2/library/multiprocessing.html

edited Mar 18 '15 at 15:04

answered Mar 17 '15 at 17:31

Christian Groleau

813
1
12
17

I tried with multiprocessing, but the overhead added for managing the process and the shared memory object is only worth for a lot of particles – Baptiste Pernet Mar 17 '15 at 20:36

score -1 · Answer 4 · answered Mar 17 '15 at 17:26

-1

It takes time to actually create the other thread and start processing it. Since we don't have control of the scheduler, I'm willing to bet both of these threads get scheduled on the same core (since the work is so small), therefore you are adding the time it takes to create the thread and no parallel processing is done

answered Mar 17 '15 at 17:26

MobileMon

8,341
5
56
75

As others have mentioned the GIL is the problem, however I'll leave this here for academic purposes because in languages where true multi threading exists, this would be applicable – MobileMon Mar 18 '15 at 13:42

Why threading increase processing time?

4 Answers4

Linked