0

Is there a way to detect and remove zero padding within an image array? In a way my question is very similar to this except the image has already been rotated and I do not know the angle.

I am basically cropping a box out of a larger image which may have zero padding around the edges (due to translations or rotations). Now it's possible that the crop may contain some of this padding. However, in such cases, I want to clip the box where the padding edge starts. The images are in a CHW (can be easily changed to HWC).

The paddings in this case will be 0s in all channels. However, due to rotations, it's possible that sometimes, the 0s might not always be in completely horizontal or vertical strips in the array. Is there a way to detect if there are 0s going all the way to the edge in the array and at what location the edges start?

Example 1 where arr is an image with 3 channels and width and height of 4 (3, 4, 4) and the crop contains vertical padding on the rightmost edge:

array([[[1., 1., 1., 0.],
        [1., 1., 1., 0.],
        [1., 1., 1., 0.],
        [1., 1., 1., 0.]],

       [[1., 1., 1., 0.],
        [1., 1., 1., 0.],
        [1., 1., 1., 0.],
        [1., 1., 1., 0.]],

       [[1., 1., 1., 0.],
        [1., 1., 1., 0.],
        [1., 1., 1., 0.],
        [1., 1., 1., 0.]]])

In this example, I would slice the array as such to get rid of the zero padding: arr[:, :, :-1]

Example 2 where we have some padding on the top right corner:

array([[[1., 1., 0., 0.],
        [1., 1., 1., 0.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]],

       [[1., 1., 0., 0.],
        [1., 1., 1., 0.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]],

       [[1., 1., 0., 0.],
        [1., 1., 1., 0.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]]])

In this example, I would clip the image to remove any padding by returning arr2[:, 1:, :-1].

I want to do this in Tensorflow so tensor operations would be great but I am trying to figure out any algorithm, for example using numpy, that can achieve this result.

skbrhmn
  • 1,124
  • 1
  • 14
  • 36
  • Sum along the columns and then along the rows. Any column or row with a sum of zero should be removed. – David Hoffman Aug 04 '20 at 04:27
  • 1
    @DavidHoffman Yes, I can do that for rows and columns starting at the edges and would work for example 1. However, this would fail when the 0s are not strictly vertical or horizontal (example 2). – skbrhmn Aug 04 '20 at 05:46
  • Scan the image outline and seed-fill every time you meet a zero. In the end all zero pixels will be filled. How you remove them is your policy. –  Aug 04 '20 at 07:42
  • is it possible for the image to contain `padding_value` (in this case 0)? If so, is it possible for the image to contain it on it's border? Should this be preserved? – FirefoxMetzger Aug 04 '20 at 09:25
  • also, `arr2[:, 1:, :-1]` will save 9 pixels, whereas the crops you suggested will only save 8 pixels respectively. – FirefoxMetzger Aug 04 '20 at 09:33
  • Can't you just remove the padding in the source image *before* you do the cropping? This way it should be trivial. – T A Aug 04 '20 at 16:29
  • @YvesDaoust Can you elaborate on what you mean? It may be helpful. I think my policy in how I want them removed is clear from the examples? My primary question 'is' about how to remove them. – skbrhmn Aug 05 '20 at 20:11
  • @FirefoxMetzger I am not sure I understand your question. It is unlikely that the image will contain the padding value across all channels for a large group of pixels. If this is the case, it can be just discarded. Ah, yes, you are right about `arr2[:, 1:, :-1]` – skbrhmn Aug 05 '20 at 20:12
  • @TA unfortunately no which is why this is a bit more challenging. – skbrhmn Aug 05 '20 at 20:12
  • Can you elaborate on what you mean by *remove* ? –  Aug 05 '20 at 20:14
  • @YvesDaoust Sorry for the ambiguous terminology. by 'remove', I mean to crop it out of the final image array. Given the input in example 1, the output should be `arr[:, :, :-1]`. In example 2, I want the output should be `arr2[:, 1:, :-1]` as FirefoxMetzger suggested. – skbrhmn Aug 05 '20 at 20:19
  • @skbrhmn: when you have several cropping choices, what do you do ? –  Aug 06 '20 at 07:35
  • @YvesDaoust Ideally, you'd want to select the largest possible crop – skbrhmn Aug 06 '20 at 16:01
  • @skbrhmn: what in case of TIES ? –  Aug 06 '20 at 17:58
  • @YvesDaoust The main criteria for selection is getting the largest area possible as I already stated. If there are ties, one may be chosen arbitrarily. – skbrhmn Aug 06 '20 at 20:57

2 Answers2

1

If you don't mind throwing away some of the image and are okay with a liberal crop as long as it doesn't contain padding, you can get a quite efficient solution:

pad_value = 0.0
arr = <test_image>
arr_masked = np.all(arr != pad_value , axis=0)
y_low = np.max(np.argmax(arr_masked, axis=0))
x_low = np.max(np.argmax(arr_masked, axis=1))
y_high = np.min(arr_masked.shape[0] - np.argmax(arr_masked[::-1, :], axis=0))
x_high = np.min(arr_masked.shape[1] - np.argmax(arr_masked[:, ::-1], axis=1))
arr[:, y_low:y_high, x_low:x_high]

If it has to be the biggest possible crop then more work is needed. Essentially we have to check every contiguous sub-image if it contains padding and then compare them all for size.

Main Idea: Assume that the top-left corner of the padding free sub-image is at (x1,y1) and the bottom-right corner is at (x2, y2) then we can understand the number of pixels in the subarray as a rank-4 tensor with dimensions [y1, x1, y2, x2]. We set the number of pixels to 0 if the combination is not a valid sub-image, i.e., if it has a negative width or height, or it contains a padded pixel.

pad_value = 0.0
arr = <test_image>

# indices for sub-image tensor
y = np.arange(arr_masked.shape[0])
x = np.arange(arr_masked.shape[1])
y1 = y[:, None, None, None]
y2 = y[None, None, :, None]
x1 = x[None, :, None, None]
x2 = x[None, None, None, :]

# coordinates of padded pixels
arr_masked = np.all(arr != pad_value , axis=0)
pad_north = np.argmax(arr_masked, axis=0)
pad_west = np.argmax(arr_masked, axis=1)
pad_south = arr_masked.shape[0] - np.argmax(arr_masked[::-1, :], axis=0)
pad_east = arr_masked.shape[1] - np.argmax(arr_masked[:, ::-1], axis=1)

is_padded = np.zeros_like(arr_masked)
is_padded[y[:, None] < pad_north[None, :]] = True
is_padded[y[:, None] >= pad_south[None, :]] = True
is_padded[x[None, :] < pad_west[:, None]] = True
is_padded[x[None, :] >= pad_east[:, None]] = True

y_padded, x_padded = np.where(is_padded)
y_padded = y_padded[None, None, None, None, :]
x_padded = x_padded[None, None, None, None, :]

# size of the sub-image
height = np.clip(y2 - y1 + 1, 0, None)
width = np.clip(x2 - x1 + 1, 0, None)
img_size = width * height

# sub-image contains at least one padded pixel
y_inside = np.logical_and(y1[..., None] <= y_padded, y_padded<= y2[..., None])
x_inside = np.logical_and(x1[..., None] <= x_padded, x_padded<= x2[..., None])
contains_border = np.any(np.logical_and(y_inside, x_inside), axis=-1)

# ignore sub-images containing padded pixels
img_size[contains_border] = 0

# find all largest sub-images
tmp = np.where(img_size == np.max(img_size))
rectangles = (tmp[0], tmp[1], tmp[2]+1, tmp[3]+1)

Now rectangles contains all the corners for the sub-images that have the largest number of pixels without containing any border pixels. It is already quite vectorized, so you should be able to migrate this from numpy to tensorflow.

BiBi
  • 7,418
  • 5
  • 43
  • 69
FirefoxMetzger
  • 2,880
  • 1
  • 18
  • 32
  • So I have tried out your algorithm and it seems to be working for the most part. I need to implement this in tensorflow as my input is going to be a tensor but that shouldn't be too difficult. Could you, however, break it down and explain exactly how you are getting the sub-images e.g. in `y_inside = np.logical_and(y1[..., None] <= y_padded, y_padded<= y2[..., None])` and the shapes used in `x1, x2, y1, y2`. I understand the overall procedure but I am having a little bit of a hard time understanding the implementation here and trying to get a better intuition for each step. – skbrhmn Aug 10 '20 at 17:41
  • @skbrhmn The sequential idea here is that we take each border/padding point and test if it is contained within the sub-image rectangle. Given a rectangle defined by its corners `(y1, x1, y2, x2)` we can test if it contains a point by checking if the point's X value is within `[x1, x2]` and the y-value is within `[y1, y2]`. The line that you quoted tests the latter (vectorized). We then set `contains_border` to `True`, if any border/padding pixel is inside the sub-image. – FirefoxMetzger Aug 10 '20 at 19:05
1

Please try this solution:

def remove_zero_pad(image):
    dummy = np.argwhere(image != 0) # assume blackground is zero
    max_y = dummy[:, 0].max()
    min_y = dummy[:, 0].min()
    min_x = dummy[:, 1].min()
    max_x = dummy[:, 1].max()
    crop_image = image[min_y:max_y, min_x:max_x]

    return crop_image
Pongthep
  • 11
  • 2
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Mar 03 '22 at 05:44
  • I believe this solution assumes that zeros appear only at the edge. While unlikely, it is still possible t have a zero spot somewhere within the image. The question is specific to cases of zero padding, that is, zeros that are contiguous all the way to the edge and are usually straight lines (not strictly vertical/horizontal). This solution would fail if a zero spot appears in the middle of the icon or is part of the actual image and not padding. – skbrhmn Mar 08 '22 at 21:15