0

I have a loop for finding several sum:

for t in reversed(range(len(inputs))):
  dy = np.copy(ps[t])
  dy[targets[t]] -= 1 
  dWhy += np.dot(dy, hs[t].T)
  dby += dy

Input value is too big and i must to make it parallel. So I so I converted the loop to a separate function. I've tried to use ThreadPoolExecutor, but result time is slow compared to the sequential algorithm.

That's my minimal working example:

import numpy as np
import concurrent.futures
import time, random 

from concurrent.futures import ThreadPoolExecutor
import threading

#parameters
dWhy = np.random.sample(300)
dby = np.random.sample(300)

def Func(ps, targets, hs,  t):
  global dWhy, dby
  dy = np.copy(ps[t])
  dWhy += np.dot(dy, hs[t].T)
  dby += dy

  return dWhy, dby

if __name__ == '__main__':    

    ps = np.random.sample(100000)
    targets = np.random.sample(100000)
    hs = np.random.sample(100000)

    start = time.time()

    for t in range(100000):
        dy = np.copy(ps[t])
        dWhy += np.dot(dy, hs[t].T)
        dby += dy

    finish = time.time()
    print("One thread: ")
    print(finish-start)

    dWhy = np.random.sample(300)
    dby = np.random.sample(300)
    start = time.time()

    with concurrent.futures.ThreadPoolExecutor() as executor:
        args = ((ps, targets, hs,  t) for t in range(100000))
        for out1, out2  in executor.map(lambda p: Func(*p), args):
            dWhy, dby = out1, out2

    finish = time.time()
    print("Multithreads time: ")
    print(finish-start)

On my PC One thread-time ~ 3s, Multithreads time ~ 1 minute.

MrLebovsky
  • 88
  • 9
  • 1
    Use ProcessPool for CPU bound operations. ThreadPools only speed up I/O bound operations. – Neil Nov 24 '18 at 12:19
  • @Neil could you show me some example for my task? I have KeyError 0 with ProcessPool – MrLebovsky Nov 24 '18 at 12:26
  • Please paste code where you call the function – Neil Nov 24 '18 at 12:27
  • In many cases it could be relevant to just use boardcasting. Since numpy is fast enogh when you do perform loops. Could you provide the dimensions (.shape) for each of your objects? – Peter Mølgaard Pallesen Nov 24 '18 at 12:46
  • Please post a minimum working example. – vy32 Nov 24 '18 at 12:51
  • Still looks like a numpy broadcasting problem more than anything else. np.dot(ps,hs) takes 0.000082 seconds on my laptop, with ps and hs being shape (100000,) and the same for the sum np.sum(ps) – Peter Mølgaard Pallesen Nov 24 '18 at 16:00
  • @PeterMølgaardPallesen is that mean i can't to speed up of algorithm width numpy? In general i must show parallel implementation with speed up results... – MrLebovsky Nov 24 '18 at 16:25
  • Just saying that parallization typically can give you a speed up of 2-16 times dependeling on the problems and your availiable hardware. As I pointed out not during proper broadcasting and during looping instead can result in a speed slowdown much larger. For example the code you stated which takes 3 seconds to perform takes 0.000164 to perform by using boardcasting a speed up of over 10000 times. – Peter Mølgaard Pallesen Nov 26 '18 at 09:09
  • @PeterMølgaardPallesen i'm beginner in Python so i don't understand. Can you give me some example of identical operations with broadcasting and without? – MrLebovsky Nov 26 '18 at 11:40
  • Replace: for t in range(100000): dy = np.copy(ps[t]) dWhy += np.dot(dy, hs[t].T) dby += dy with dWhy += np.dot(dy, ps) and dby += np.sum(ds). Which performs the same but take 10000 times shorter time. Potentially start reading with something like this.https://jakevdp.github.io/PythonDataScienceHandbook/02.05-computation-on-arrays-broadcasting.html in general the strength of numpy is that you by using broadcasting can achive C-speed for most problems – Peter Mølgaard Pallesen Nov 26 '18 at 12:26
  • Looks like a duplicate of https://stackoverflow.com/questions/9068478/how-to-parallelize-a-sum-calculation-in-python-numpy – Johann8 Nov 26 '18 at 12:45
  • 1
    Possible duplicate of [How to parallelize a sum calculation in python numpy?](https://stackoverflow.com/questions/9068478/how-to-parallelize-a-sum-calculation-in-python-numpy) – Johann8 Nov 26 '18 at 12:46

2 Answers2

0

Turn the lambda into a named function.

vy32
  • 28,461
  • 37
  • 122
  • 246
0

Consider impliment it with broadcasting instead:

import numpy as np
dWhy = np.random.sample(300)
dby = np.random.sample(300)

ps = np.random.sample(100000)
targets = np.random.sample(100000)
hs = np.random.sample(100000)

dWhy += np.dot(ps,hs)
dby += np.sum(ps)

When running it 20000 times faster

timer = time.time()
for i in range(20000):
    dWhy += np.dot(ps,hs)
    dby += np.sum(ps)
print(time.time()-timer)
>>3.2034592628479004
print(time.time()-timer)/20000)
>>0.00016017296314239503
Peter Mølgaard Pallesen
  • 1,470
  • 1
  • 15
  • 26