Given that:
- 2GHz processors have been available for around 20 years and only around 3GHz is "mainstream" today, and
- quad-core and up to 16-core processors are fairly common nowadays
it would seem processors are becoming "fatter" (more cores) rather than "taller" (more GHz). So you would probably do well to leverage that.
Given that Python has a GIL, it is not so good for multi-threading, so you would probably do better to use multi-processing instead of multi-threading and allow each core to work independently on an image to minimise the amount of pickling and sharing data between processes.
You didn't mention the format or dimensions of your images. If they are JPEGs, you might consider using turbo-jpeg. If they are very large, memory may be an issue with multiprocessing.
The likely candidates are probably the following:
- OpenCV
- vips
- Pillow
- ImageMagick/wand
but it will depend on many things:
- CPU - GHz, cores, generation
- RAM - amount, speed, channels, timings
- disk subsystem - spinning, SSD, NVMe
- image format, dimensions, bit-depth
So you'll need to benchmark. I did some similar benchmarking here.
If you want to replace the process_file()
in John's answer with a PIL or OpenCV version, it might look like this:
import pathlib
import numpy as np
from PIL import Image
import cv2
def withPIL(filename):
pathlib.Path(f"out/{filename}_tiles").mkdir(parents=True, exist_ok=True)
image = Image.open(filename)
for y in range(3):
for x in range(3):
top, left = y*720, x*1280
tile = image.crop((left,top,left+1280,top+720))
tile.save(f"out/{filename}_tiles/{x}_{y}.png")
def withOpenCV(filename):
pathlib.Path(f"out/{filename}_tiles").mkdir(parents=True, exist_ok=True)
image = cv2.imread(filename,cv2.IMREAD_COLOR)
for y in range(3):
for x in range(3):
top, left = y*720, x*1280
tile = image[top:top+720, left:left+1280]
cv2.imwrite(f"out/{filename}_tiles/{x}_{y}.png",tile)