0

I need to implement search function for array of complex class but when I changed to multi thread, I figured out that it became worse than past!

So I test simple code and saw it is true. My code:

import numpy as np
import threading
import time


class Test(object):
    
    def __init__(self):
        self.num_workers = 1
        self.workers = []
        self.poses = []
        self.arr = None

    def search(self, arr, val):
        self.poses = []
        for i, a in enumerate(arr):
            if a == val:
                self.poses.append(i)
        return self.poses

    def search_worker(self, val, ID):
        search_len = int(len(self.arr) / self.num_workers)
        prefix = ID * search_len
        if ID == self.num_workers - 1:
            search_len = int(len(self.arr) - prefix)

        for i in range(search_len):
            if self.arr[prefix + i] == val:
                self.poses.append(i)

    def search_multi_thread(self, arr, val):
        self.arr = arr
        self.poses = []
        self.num_workers = 5
        for i in range(self.num_workers):
            worker = threading.Thread(target=self.search_worker, args=(val, i,))
            worker.start()
            self.workers.append(worker)
        for i in range(self.num_workers):
            self.workers[i].join()
        return self.poses
    

if __name__ == '__main__':
    t = Test()
    sample = np.random.randint(1000, size=50000000)
    t1 = time.perf_counter()
    res = t.search(sample, 65)
    t2 = time.perf_counter()
    print(F'Elapsed time to search = {t2 - t1}')
    
    t1 = time.perf_counter()
    res = t.search_multi_thread(sample, 65)
    t2 = time.perf_counter()
    print(F'Elapsed time to search with multiple thread = {t2 - t1}')

result :

Elapsed time to search = 13.291269699999999

Elapsed time to search with multiple thread = 17.8231911

Environment:

OS = windows 10

python = 3.7.7

CPU = Intel core i7 6700HQ

Whats I wrong?

How can I solve this problem? (I read about multiprocessing but it seems that each process has different stack so they cant access to single array)

ali kiani
  • 877
  • 1
  • 14
  • 29
  • You are doing CPU bound task and you are doing multi threading. This is not multithreading for, I suppose. Do u want multiprocessing? – Epsi95 Feb 11 '21 at 06:00
  • @Epsi95 Multithreading should be able to use multiple cores. – Barmar Feb 11 '21 at 06:03
  • I want multi threading, How I do that? – ali kiani Feb 11 '21 at 06:03
  • @alikiani That's what you have. However, you need a lock around access to `self.poses`. – Barmar Feb 11 '21 at 06:05
  • The performance difference could be that the sequental version has fewer cache misses. – Barmar Feb 11 '21 at 06:06
  • @Barmar but this fails the reason of multithreading. In multithreading, the computation will switch between threads which will add extra overhead. This is my understanding, I may be wrong. I would chinkify the put data and give it to different processes. – Epsi95 Feb 11 '21 at 06:54
  • 1
    @Epsi95 Both multi-threading and multi-processing use multiple cores. The difference is that threads are in the same address space, processes are independent address spaces. – Barmar Feb 11 '21 at 06:59
  • Thank you for the information. But don't you think that when doing computations, we should not stop it and switch to a different computation. Since this switching can be painful. Instead, one process should do computation linearly? – Epsi95 Feb 11 '21 at 07:01
  • Actually, according to the linked question, CPython isn't able to do that with threads. – Barmar Feb 11 '21 at 07:01
  • @Barmar Can I solve this with multiprocessing? – ali kiani Feb 11 '21 at 07:05
  • Please link to tutorial how share between processes. – ali kiani Feb 11 '21 at 07:05
  • @alikiani There's an answer below that shows how to do it with multiprocessing. – Barmar Feb 11 '21 at 07:07

2 Answers2

0

Note, while working with threads, one thing to be kept in mind is that threads tend to increase the efficiency of your program when there are possibilities of the processr sitting idle (whatever be the cause like i/o, sleeps, user interactions, etc.) during the processing of your work. If this is not the case, then the overhead of thread switching will simply further degrade the performance of your program.

In your case, there are very low possibilities of processor sitting idle for significant time. Moreover, you are using far too many threads for this task. So, the thread switching simply overweights whatever performance you achieve from the use of threads. Hence, the performance of your program degrades further.

Grasshopper
  • 363
  • 2
  • 14
0

Massive performance boost after using multiprocessing.

import numpy as np
import time
import multiprocessing


class Test(object):

    def __init__(self):
        self.num_workers = 1
        self.workers = []
        self.poses = []
        self.arr = None

    def search(self, arr, val):
        self.poses = []
        for i, a in enumerate(arr):
            if a == val:
                self.poses.append(i)
        return self.poses

    def search_worker(self, val, ID):
        search_len = int(len(self.arr) / self.num_workers)
        prefix = ID * search_len
        if ID == self.num_workers - 1:
            search_len = int(len(self.arr) - prefix)

        for i in range(search_len):
            if self.arr[prefix + i] == val:
                self.poses.append(i)

    def search_multi_thread(self, arr, val):
        self.arr = arr
        self.poses = []
        self.num_workers = 5
        for i in range(self.num_workers):
            worker = multiprocessing.Process(target=self.search_worker, args=(val, i,))
            worker.start()
            self.workers.append(worker)
        for i in range(self.num_workers):
            self.workers[i].join()
        return self.poses


if __name__ == '__main__':
    t = Test()
    sample = np.random.randint(1000, size=50000000)
    t1 = time.perf_counter()
    res = t.search(sample, 65)
    t2 = time.perf_counter()
    print(F'Elapsed time to search = {t2 - t1}')

    t1 = time.perf_counter()
    res = t.search_multi_thread(sample, 65)
    t2 = time.perf_counter()
    print(F'Elapsed time to search with multiprocessing = {t2 - t1}')
SuperNoob
  • 170
  • 1
  • 10