How to decrease time of execution using multi-threading python

Question

I am performing DCT(in Raspberry Pi). I've broken the image into 8x8 blocks. Initially I performed DCT in nested for loop (without multithreading). I observed that it takes about 18 seconds for a 512x512 image. But, Here's the code with multi-threads

   #!/usr/bin/env python

from __future__ import print_function,division
import time
start_time = time.time()
import cv2
import numpy as np
import sys
import pylab as plt
import threading
import Queue

from numpy import empty,arange,exp,real,imag,pi
from numpy.fft import rfft,irfft
from pprint import pprint 

queue = Queue.Queue()

if len(sys.argv)>1:
        im = cv2.imread(sys.argv[1])
else :
        im = cv2.imread('baboon.jpg')

        im = cv2.cvtColor(im,  cv2.COLOR_BGR2GRAY)
        h, w = im.shape[:2]
        DF = np.zeros((h,w))
        Nb=8

def dct2(y):
    M = y.shape[0]
    N = y.shape[1]
    a = empty([M,N],float)
    b = empty([M,N],float)

    for i in range(M):
        a[i,:] = dct(y[i,:])
    for j in range(N):
        b[:,j] = dct(a[:,j])

    queue.put(b)

def dct(y):
    N = len(y)
    y2 = empty(2*N,float)
    y2[:N] = y[:]
    y2[N:] = y[::-1]

    c = rfft(y2)
    phi = exp(-1j*pi*arange(N)/(2*N))
    return real(phi*c[:N])

def Main():
    jobs = []
    for row in range(0, h, Nb):
            for col in range(0, w, Nb):
                            f =  im[(row):(row+Nb), (col):(col+Nb)]
                            thread = threading.Thread(target=dct2(f))
                            jobs.append(thread)
                            df = queue.get()
                            DF[row:row+Nb, col:col+Nb] = df 

    for j in jobs:
            j.start()


    for j in jobs:
            j.join()


if __name__ == "__main__":
        Main()


cv2.imwrite('dct_img.jpg', DF)
print("--- %s seconds ---" % (time.time() - start_time))
plt.imshow(DF1, cmap = 'Greys')
plt.show()
cv2.waitKey(0)
cv2.destroyAllWindows()

After using multiple threads, this code take about 25 seconds to get executed. What's wrong? Have I implemented multi-threading wrongly? I want to reduce the time taken to perform DCT as much as possible (1-5 seconds). Any suggestions?

Any other concept or method (I've read post on multiprocessing) that'll significantly reduce my execution and processing time?

Welcome to the [GIL](https://wiki.python.org/moin/GlobalInterpreterLock). For CPU intense tasks you'll have to use multiprocessing as you suspected. — Voo, Feb 04 '16 at 10:14

score 1 · Answer 1 · answered Feb 04 '16 at 10:23

1

Due to GIL all your threads are executed in a sequence (not in parallel). So you might want to switch to multiprocessing. Another option is to build numba, which can greatly increase speed of usual python code and also can unlock GIL.

answered Feb 04 '16 at 10:23

Roman Kh

2,708
2
18
16

score 1 · Answer 2 · answered Feb 04 '16 at 10:25

1

In Python, you should use multithreading for performances only when mixing IO and CPU tasks.

For your problem you should use multiprocessing.

answered Feb 04 '16 at 10:25

Benjamin

3,350
4
24
49

score 0 · Answer 3 · edited May 23 '17 at 12:23

Maybe the other posters are right about the GIL. But OpenCV as well as Numpy release the GIL so I would at least expect a speedup from a multithreaded solution.

I would have a look at how many threads you are creating simultaneously. It's probably a lot since you start one for each 8 by 8 pixel sub picture. (Each time a thread is taken off the cpu and replaced by another it incurs a small overhead which in sum gets quite noticeable if you have a lot of threads)

If this is the case you probably gain performance by not starting them all at once but to only start as many as you have cpu cores (a few more a few less...just experiment) and only start the next thread if one has finished.

Look at the answers to this question on how to do this with minimal effort.

How to decrease time of execution using multi-threading python

3 Answers3