1

I'm slowly switching to Python and I wanted to make a simple test for comparing the performance of a simple array summation. I generate a random 1000x1000 array and add one to each of the values in this array.

Here my script in Python :

import time

import numpy
from numpy.random import random

def testAddOne(data):
    """
    Test addOne
    """
    return data + 1

i = 1000
data = random((i,i))
start = time.clock()
for x in xrange(1000): 
    testAddOne(data)

stop = time.clock()
print stop - start

And my function in MATLAB:

function test
%parameter declaration
c=rand(1000);

tic
for t = 1:1000
    testAddOne(c);
end
fprintf('Structure: \n')
toc
end

function testAddOne(c)
c = c + 1;
end

The Python takes 2.77 - 2.79 seconds, the same as the MATLAB function (I'm actually quite impressed by Numpy!). What would I have to change to my Python script to use multithreading? I can't in MATLAB since I don,t have the toolbox.

m_power
  • 3,156
  • 5
  • 33
  • 54
  • Is that really fair on `MATLAB`, to add one element at a time, when you can do in one go and because that's where the power of MATLAB lies in? If you `inline` the function, I won't be surprised if you get some appreciable improvement with `MATLAB`. – Divakar Apr 04 '14 at 13:31
  • 1
    @Divakar You should check the code again, MATLAB is adding the ones in a single call. What is probably misleading you is that I'm running this 1000 times, which also correspond to the length of the array. To understand, the for loop `t = 1:1000` could be `t = 1:randi([1000,2000])` – m_power Apr 04 '14 at 13:39
  • My point is that MATLAB is not good when you do operations on one element at a time. That's why for example if you do matrix multiplication one element at a time and compare with the built-in matrix multiplication, you would see the huge difference in performance. – Divakar Apr 04 '14 at 13:46
  • 1
    That's right, hence why my function is using the MATLAB vectorized approach. If I didn't do this, I would have to use two for loop statement i and j and going through each index to add one. Which like you said, wouldn't use the power of MATLAB. – m_power Apr 04 '14 at 13:57
  • 1
    Looks like a fair comparison. – Divakar Apr 04 '14 at 14:09
  • 2
    Some relevant discussion [here](http://stackoverflow.com/q/11442191/553404), [here](http://stackoverflow.com/q/5260068/553404) and [here](http://stackoverflow.com/q/16617973/553404). – YXD Apr 04 '14 at 14:26

1 Answers1

2

Multi threading in Python is only useful for situations where threads get blocked, e.g. on getting input, which is not the case here (see the answers to this question for more details). However, multi processing is easy to do in Python. Multiprocessing in general is covered here.

A program taking a similar approach to your example is below

import time
import numpy
from numpy.random import random
from multiprocessing import Process

def testAddOne(data):
    return data + 1

def testAddN(data,N):
    # print "testAddN", N
    for x in xrange(N): 
        testAddOne(data)

if __name__ == '__main__':
    matrix_size = 1000
    num_adds = 10000
    num_processes = 4

    data = random((matrix_size,matrix_size))

    start = time.clock()
    if num_processes > 1:
        processes = [Process(target=testAddN, args=(data,num_adds/num_processes))
                     for i in range(num_processes)]
        for p in processes:
            p.start()
        for p in processes:
            p.join()
    else:
        testAddN(data,num_adds)

    stop = time.clock()
    print "Elapsed", stop - start

A more useful example using a pool of worker processes to successively add 1 to different matrices is below.

import time
import numpy
from numpy.random import random
from multiprocessing import Pool

def testAddOne(data):
    return data + 1

def testAddN(dataN):
    data,N=dataN
    for x in xrange(N): 
        data = testAddOne(data)
    return data

if __name__ == '__main__':
    num_matrices = 4
    matrix_size = 1000
    num_adds_per_matrix = 2500

    num_processes = 4

    inputs = [(random((matrix_size,matrix_size)), num_adds_per_matrix)
              for i in range(num_matrices)]
    #print inputs # test using, e.g., matrix_size = 2

    start = time.clock()

    if num_processes > 1:
        proc_pool = Pool(processes=num_processes)
        outputs = proc_pool.map(testAddN, inputs)    
    else:
        outputs = map(testAddN, inputs)

    stop = time.clock()
    #print outputs # test using, e.g., matrix_size = 2
    print "Elapsed", stop - start

In this case the code in testAddN actually does something with the result of calling testAddOne. And you can uncomment the print statements to check that some useful work is being done.

In both cases I've changed the total number of additions to 10000; with fewer additions the cost of starting up processes becomes more significant (but you can experiment with the parameters). And you can experiment with num_processes also. On my machine I found that compared to running in the same process with num_processes=1 I got just under a 2x speedup spawning four processes with num_processes=4.

Community
  • 1
  • 1
TooTone
  • 7,129
  • 5
  • 34
  • 60
  • 1
    I'm trying your first example right now, and I got a 2.16x speedup (with 4 processes) – m_power Apr 04 '14 at 15:47
  • Still trying to understand what you are doing in the second example! But thanks for both solutions! – m_power Apr 04 '14 at 16:25
  • 1
    @m_power thanks for the feedback! In computing `map` is used to mean take one list `L`, process each element in the list with some function `f`, producing another list `L'`. The list `L=[x0, x1, ..., xn]` becomes `L=[f(x0), f(x1), ..., f(xn)]`. What I really like about the Python process pool is that a multi-process map looks almost exactly the same as the single-process map. – TooTone Apr 04 '14 at 16:29