1

I wrote a little program to check a bunch of hashes, and I was wondering if this is a correct application of starmap and if it should be making other processes. My current understanding is that starmap is supposed to divide the task amongst the workers, but it doesn't seem to do it.

from multiprocessing import Pool
from hashlib import sha256
import datetime


start = datetime.datetime.now()

def hash(text):
    return sha256(text.encode("utf-8")).hexdigest()

string = "Baguette "

def f(r, z, string):
    for i in range(r, z):
        j = hash(string + str(i))
        if j.startswith("000000"):
            print(i, j[0:6])

if __name__ == '__main__':
    with Pool(8) as p:
        p.starmap(f, [[0, 10000000, string]])
        p.starmap(f, [[20000000, 30000000, string]])
        p.starmap(f, [[30000000, 40000000, string]])
        p.starmap(f, [[40000000, 50000000, string]])
    print(datetime.datetime.now()-start)

Python apparently has 9 processes open, but only one is using the CPU (11.9%)

NeffPeff
  • 46
  • 5
  • You're probably spending most of the CPU on the `print` line which is only handled by one of the processes. Only one process will take stdout and send it to the console. Try taking that out and re-evaluating – flakes Apr 19 '22 at 00:12
  • @flakes running with this: `def f(r, z, string): for i in range(r, z): j = hash(string + str(i)) ` it still shows only around 12% usage. – NeffPeff Apr 19 '22 at 00:28
  • 1
    Oh I see the issue now. `starmap` and `map` blocks until the result is ready. You want to do all of this in one call to `starmap`. i.e. `p.starmap(f, [[0, 10000000, string], [20000000, 30000000, string], [30000000, 40000000, string], [40000000, 50000000, string]])` – flakes Apr 19 '22 at 00:47
  • Or you can use `starmap_async` and later gather the results of the `AsyncResult` return values. – flakes Apr 19 '22 at 00:49

1 Answers1

1
from multiprocessing import Pool
from hashlib import sha256
import datetime


start = datetime.datetime.now()

def hash(text):
    return sha256(text.encode("utf-8")).hexdigest()

string = "Baguette "

def f(r, z, string):
    for i in range(r, z):
        j = hash(string + str(i))
        if j.startswith("000000"):
            print(i, j[0:6])

if __name__ == '__main__':
    with Pool(8) as p:
        p.starmap(f, [[0, 10000000, string], [20000000, 30000000, string], [30000000, 40000000, string], [40000000, 50000000, string]])
    print(datetime.datetime.now()-start)

Thank you @flakes for the good answer :)

flakes' comment: Is my understanding of multiprocessing's starmap correct?

NeffPeff
  • 46
  • 5