2

I have 2 simple functions(loops over a range) that can run separately without any dependency.. I'm trying to run this 2 functions both using the Python multiprocessing module as well as multithreading module..

When I compared the output, I see the multiprocess application takes 1 second more than the multi-threading module..

I read multi-threading is not that efficient because of the Global interpreter lock...

Based on the above statements -
1. Is is best to use the multiprocessing if there is no dependency between 2 processes?
2. How to calculate the number of processes/threads that I can run in my machine for maximum efficiency..
3. Also, is there a way to calculate the efficiency of the program by using multithreading...

Multithread module...

from multiprocessing import Process

import thread
import platform

import os
import time
import threading
class Thread1(threading.Thread):
    def __init__(self,threadindicator):
        threading.Thread.__init__(self)
        self.threadind = threadindicator

    def run(self):
        starttime = time.time() 
        if self.threadind == 'A':
            process1()
        else:
            process2()
        endtime = time.time()
        print 'Thread 1 complete : Time Taken = ', endtime - starttime

def process1():
    starttime = time.time() 
    for i in range(100000):
        for j in range(10000):
            pass        
    endtime = time.time() 

def process2():
    for i in range(1000):
        for j in range(1000):
            pass

def main():

    print 'Main Thread'
    starttime = time.time()
    thread1 = Thread1('A')
    thread2 = Thread1('B')
    thread1.start()
    thread2.start()
    threads = []
    threads.append(thread1)
    threads.append(thread2)

    for t in threads:
        t.join()
    endtime = time.time()
    print 'Main Thread Complete , Total Time Taken = ', endtime - starttime


if __name__ == '__main__':
    main()

multiprocess module

from multiprocessing import Process
import platform

import os
import time

def process1():
#     print 'process_1 processor =',platform.processor()
    starttime = time.time() 
    for i in range(100000):
        for j in range(10000):
            pass
    endtime = time.time()
    print 'Process 1 complete : Time Taken = ', endtime - starttime 


def process2():
#     print 'process_2 processor =',platform.processor()
    starttime = time.time()
    for i in range(1000):
        for j in range(1000):
            pass
    endtime = time.time()
    print 'Process 2 complete : Time Taken = ', endtime - starttime

def main():
    print 'Main Process start'
    starttime = time.time()
    processlist = []

    p1 = Process(target=process1)
    p1.start()
    processlist.append(p1)

    p2 = Process(target = process2)
    p2.start()
    processlist.append(p2)

    for i in processlist:
        i.join()
    endtime = time.time()
    print 'Main Process Complete - Total time taken = ', endtime - starttime

if __name__ == '__main__':
    main()
rawwar
  • 4,834
  • 9
  • 32
  • 57
user1050619
  • 19,822
  • 85
  • 237
  • 413
  • As a side note: time.time() may have a precision as low as 1 second, and also may get confused by clock changes. So it's not an ideal way to measure performance, especially for code that only takes about a second. – abarnert Oct 12 '13 at 01:04

1 Answers1

7

If you have two CPUs available on your machine, you have two processes which don't have to communicate, and you want to use both of them to make your program faster, you should use the multiprocessing module, rather than the threading module.

The Global Interpreter Lock (GIL) prevents the Python interpreter from making efficient use of more than one CPU by using multiple threads, because only one thread can be executing Python bytecode at a time. Therefore, multithreading won't improve the overall runtime of your application unless you have calls that are blocking (e.g. waiting for IO) or that release the GIL (e.g. numpy will do this for some expensive calls) for extended periods of time. However, the multiprocessing library creates separate subprocesses, and therefore several copies of the interpreter, so it can make efficient use of multiple CPUs.

However, in the example you gave, you have one process that finishes very quickly (less than 0.1 seconds on my machine) and one process that takes around 18 seconds to finish on the other. The exact numbers may vary depending on your hardware. In that case, nearly all the work is happening in one process, so you're really only using one CPU regardless. In this case, the increased overhead of spawning processes vs threads is probably causing the process-based version to be slower.

If you make both processes do the 18 second nested loops, you should see that the multiprocessing code goes much faster (assuming your machine actually has more than one CPU). On my machine, I saw the multiprocessing code finish in around 18.5 seconds, and the multithreaded code finish in 71.5 seconds. I'm not sure why the multithreaded one took longer than around 36 seconds, but my guess is the GIL is causing some sort of thread contention issue which is slowing down both threads from executing.

As for your second question, assuming there's no other load on the system, you should use a number of processes equal to the number of CPUs on your system. You can discover this by doing lscpu on a Linux system, sysctl hw.ncpu on a Mac system, or running dxdiag from the Run dialog on Windows (there's probably other ways, but this is how I always do it).

For the third question, the simplest way to figure out how much efficiency you're getting from the extra processes is just to measure the total runtime of your program, using time.time() as you were, or the time utility in Linux (e.g. time python myprog.py). The ideal speedup should be equal to the number of processes you're using, so a 4 process program running on 4 CPUs should be at most 4x faster than the same program with 1 process, assuming you get maximum benefit from the extra processes. If the other processes aren't helping you that much, it will be less than 4x.

airfrog
  • 71
  • 3
  • 1
    "The Global Interpreter Lock (GIL) prevents the Python interpreter from making efficient use of more than one CPU by using multiple threads, because only one thread can be executing at a time. " This is **false**. Python's threads **can** execute at the same time. What they cannot do is to execute *bytecode* at the same time. If one thread does an expensive `numpy` call then other python threads will be executed concurrently since `numpy` releases the GIL. Similarly many C extension release the GIL during expensive operations. – Bakuriu Oct 11 '13 at 21:46
  • Thanks for the clarification - I don't think this was clear in my mind. I've edited my answer to be more accurate on this point. – airfrog Oct 11 '13 at 22:03
  • You can get the CPU count [from inside Python](http://docs.python.org/2/library/multiprocessing.html#miscellaneous), which is better than editing the code manually for each machine. – abarnert Oct 12 '13 at 01:02