I have code which looks like this:
def get_image_stats(fp):
img = cv2.imread(fp)
return img.shape[0], img.shape[1], img.shape[0]/img.shape[1]
with ThreadPool(16) as pool:
res = list(tqdm(pool.imap_unordered(get_image_stats, df.file_path), total=len(df)))
heights, widths, ars = list(zip(*res))
The only library specific part there is cv2.imread
which is simply loading an image file into a numpy array, so it's I/O bound.
Why would my CPU usage look like this?
Notes on that image:
- Horizontal axis i time in seconds, and vertical axis is cpu % usage ranging from 0% to 100%. The update interval is 1 second.
- 40s is where I started the script
- It's not easy to see, but there are 16 cores.
Another note: I did not set n_workers to 16 because I have 16 cores. Just a coincidence.
So why is this using up 75% of 16 cores at once?