1

I want to find a way to effectively do batch image cropping. The input image is the same. Each crop has different input offsets, height, and width.

Naive code:

img = np.zeros([100, 100, 3])
ofsets_x = np.array([10, 15, 18])
img_w = np.array([10, 12, 15])
ofsets_y = np.array([20, 22, 14])
img_h = np.array([14, 12, 16])

crops= []
for i in range(ofsets_x.shape[0]):
    ofset_x = ofsets_x[i]
    ofset_y = ofsets_y[i]
    w = img_w[i]
    h = img_h[i]

    crop = img[ofsets_x:ofsets_x + w, ofsets_y:ofsets_y + h, :] 
    crops.append(crop)

Because of this works very slow both in numpy and tensorflow(in tensorflow I am resizing each crop in the end of the loop to the specific size with tf.image.resize). In tensorflow I have tried also tf.vectorized_map and tf.while_loop - didn't gave me any significant speed boost. All this > 20x more slow then in C++. Crop is a simple memcpy. It should be superfast especially with preallocated memory.

How to this faster in numpy or tensorflow?

I'mahdi
  • 23,382
  • 5
  • 22
  • 30
Brans
  • 649
  • 1
  • 7
  • 20
  • Are you certain that cropping is the bottleneck, not the resizing? Your cropped images are of different sizes, so tf.image.resize (which is a much heavier operation) can't do with batch. – Quang Hoang Jun 15 '22 at 14:25
  • In numpy this is the best you can do. Each crop has to a separate indexing operation. – hpaulj Jun 15 '22 at 14:32
  • @QuangHoang I have measured and to my surprise the cropping take ~ the same time as resizing( checked by removing it). And I am using lanczos5 interpolation. Probably new tensor is created for every crop and that takes time. I have 60 slices on 16 1024 * 800 images on each step. – Brans Jun 15 '22 at 14:34
  • @I'mahdi Is ok, if it can use PILLOW Image.resize or other image resize – Brans Jun 15 '22 at 14:55
  • Just a little improvement: `crops = [img[x_start:x_end, y_start:y_end] for x_start, x_end, y_start, y_end in zip(ofsets_x, ofsets_x + img_w, ofsets_y, ofsets_y + img_h)]` – Mechanic Pig Jun 15 '22 at 15:07

1 Answers1

2

I write code in as you tag in your question. I create a datase with tf.data.Dataset and use map. I check for 5_000 images and get 751 ms for cropping and resizing the images. (Because I check code in colab and have low ram only check run_time for 5_000 images). I repeat each image three times and set the number of indexes in the dataset for using parallelism and select from ofsets for cropping then resizing.

Creating image dataset for testing benchmark:

import numpy as np
import tensorflow as tf
num_imgs = 5_000
len_ofset = 3
img = np.random.rand(num_imgs, 100, 100, 3)
img_dataset = tf.data.Dataset.from_tensor_slices((np.tile(img, (len_ofset,1,1,1)), 
                                                  np.repeat(np.arange(len_ofset), num_imgs)))


ofsets_x = np.array([10, 15, 18])
img_w = np.array([10, 12, 15])
ofsets_y = np.array([20, 22, 14])
img_h = np.array([14, 12, 16])

# converting ofsets to tensor for using in tf.function
tns_ofsets_x = tf.convert_to_tensor(ofsets_x)
tns_img_w = tf.convert_to_tensor(img_w)
tns_ofsets_y = tf.convert_to_tensor(ofsets_y)
tns_img_h = tf.convert_to_tensor(img_h)

Benchmark in colab: (suppose you want to resize images to (16,16))

%%time
size_resize = 16
def crop_resize(img, idx_crop):
    ofset_x = tns_ofsets_x[idx_crop]
    ofset_y = tns_ofsets_y[idx_crop]
    w = tns_img_w[idx_crop]
    h = tns_img_h[idx_crop]
    img = img[ofset_x:ofset_x + w, ofset_y:ofset_y + h, :] 
    img = tf.image.resize(img, (size_resize, size_resize))
    return img

img_dataset = img_dataset.map(
    map_func = crop_resize,
    num_parallel_calls=tf.data.AUTOTUNE
    )

next(iter(img_dataset.take(1))).shape
# TensorShape([16, 16, 3])

Output:

CPU times: user 714 ms, sys: 2.07 s, total: 2.78 s
Wall time: 3.64 s
I'mahdi
  • 23,382
  • 5
  • 22
  • 30