I am trying to write a script where an asset is to be manupulated twice with the second manupulation building upon the first one. Assuming that the first one takes significantly longer than the second one, I was thinking that the best way to do it would be to have multiple workers on the first and a single on the second. I am having issues coding that however. I setup an example to demostrate.
In this example, the asset is a list of strings. The first manupulation is calculating the hash values of the strings and the second one is summing the digits of the calculated hashses. The runtimes are artificially tweaked to create the described effect with time.sleep
. To make the manupulations clear, here are the steps for a single string:
>>> s = 'foobar'
>>> h = hash(s)
>>> h
5857481616689290475
>>> n = sum(int(v) for v in str(abs(h)))
>>> n
101
I have managed to get it to work with 1 process handling each manupulation (see below) but I want to have multiple ones on the first.
import multiprocessing as mp
from time import sleep
from random import random
def hasher(q, l, words):
for word in words:
l.acquire()
h = hash(word)
print('hash of {} is {}'.format(word, h))
l.release()
sleep(1.0 * random())
q.put(h)
q.put('END')
def summer(q, l):
while True:
data = q.get()
sleep(0.1 * random())
if data == 'END':
break
else:
l.acquire()
print('sum of {} is {}'.format(data, sum(int(x) for x in str(abs(data)))))
l.release()
if __name__ == '__main__':
queue = mp.Queue()
lock = mp.Lock()
words = ['fwiimaafqa', 'nuwivfmgdc', 'foymwgcbut', 'sefmayofio', 'crbgzpihpa',
'xsioddsfyw', 'zbefmckkyi', 'vkxymewyvt', 'ryrvrfkjqf', 'zobdvstxfh']
bots = []
for _ in range(1):
bot = mp.Process(target=hasher, args=(queue, lock, words))
bot.start()
bots.append(bot)
bot2 = mp.Process(target=summer, args=(queue, lock))
bot2.start()
for bot in bots:
bot.join()
bot2.join()
can someone help me start more hashers
in parallel?
P.S I also tried with Pool
but there, I could not get the summer
to work in parallel..