I have a dataset of around 3000 images, of which I want to crop multiple areas of each image if I have the coordinates of the bounding boxes for their location. The only problem is my code is extremely slow, I've tried profiling and using Cython, but with marginal improvements. I'm using the Pillow library for cropping, is their perhaps a faster way of achieving this task?
The bounding box locations are stored in a CSV file. The code below iterates over every file
train_label=pd.read_csv("train.csv")
for i in range(len(train_label.index)):
name=train_label["image_id"][i]; labels=train_label["labels"][i];
split_images(name,labels)
And the function which does the heavily lifting as below.
def split_images(name, labels):
boundingboxes = np.array(labels.split(' ')).reshape(-1, 5)
for (unicode, x, y, w, h) in boundingboxes:
try:
# Create target Directory
os.mkdir('unicodes/{}'.format(str(unicode)))
except FileExistsError:
None
(x, y, w, h) = (int(x), int(y), int(w), int(h))
imsource = Image.open('train_images/{}.jpg'.format(name))
cropped_image = imsource.crop((x, y, x + w, y + h))
cropped_image.save('unicodes/{}/{}.jpg'.format(unicode, name))
I'm running the code remotely on Google cloud platform if that helps.