3

I try to hash many file, but it not use full of cpu power. it only consume 25%. i test to move the heavy process into thread. but still no different. im from nodejs use sharp library. with same task. it consume all cpu usage. How python to make it full power?

import cv2
import math
import datetime
import hashlib
import threading

def thread_function(image, yPos, xPos, wSizeBlock, hSizeBlock):
    block = image[yPos:yPos+wSizeBlock, xPos:xPos+hSizeBlock]

    hash = hashlib.sha256()
    hash.update(block.tobytes())
    print(hash.hexdigest())

image = cv2.imread('frame323.jpg', cv2.IMREAD_COLOR)
dimension = {
    'width': image.shape[1],
    'height': image.shape[0]
}

wSizeBlock = int(16)
hSizeBlock = int(16)
wBlockLength = math.floor(dimension['width'] / wSizeBlock)
hBlockLength = math.floor(dimension['height'] / hSizeBlock)
count = 0

start_time = datetime.datetime.now()
print(start_time)
for k in range(0, 500):
    for i in range(0, wBlockLength):
        for j in range(0, hBlockLength):
            xPos = int(i*wSizeBlock)
            yPos = int(j*hSizeBlock)

            x = threading.Thread(target=thread_function, args=(image, xPos, yPos, wSizeBlock, hSizeBlock))
            x.start()
            count += 1
    count = 0

end_time = datetime.datetime.now()
print(end_time)
Christoph Rackwitz
  • 11,317
  • 4
  • 27
  • 36
imagesck
  • 81
  • 5
  • You might want to use [multiprocessing](https://docs.python.org/3/library/multiprocessing.html) – Kraigolas Jun 24 '22 at 22:20
  • Spawning a thread per work item is wasteful. Use a "Thread Pool". – Christoph Rackwitz Jun 25 '22 at 08:54
  • your indexing is wrong. you've been told in [the previous question](https://stackoverflow.com/questions/72749278/how-to-pass-value-into-opencv-crop). you are mixing up x with height and y with width. – Christoph Rackwitz Jun 25 '22 at 08:55
  • Does this answer your question? [Threading pool similar to the multiprocessing Pool?](https://stackoverflow.com/questions/3033952/threading-pool-similar-to-the-multiprocessing-pool) – Christoph Rackwitz Jun 25 '22 at 08:59

1 Answers1

2

For CPU intensive operations that can be split up into smaller tasks, you would want to use the multiprocessing module. It is similar to the threading module in that it allows multiple functions to be ran at once. Syntax looks something like this:

import multiprocessing as mp


def add(a, b):
    return a + b


p = mp.Process(target=add, args=(1, 2))
p.start()
  • that's gonna make the whole thing even more wasteful. in the question, per work item, a thread was spawned. now you're proposing to spawn an entire process per work item. further, you are implying that `threading` can't "run multiple functions at once", which is wrong. that is precisely what threads are for. if any library function fails to release the GIL during compute-intensive operations, that's another problem. most serious libraries release the GIL. – Christoph Rackwitz Jun 25 '22 at 08:57