0

I have a mask drawn over an apple using segmentation. The mask layer has 1's where the pixel is part of the apple and 0's everywhere else. How do i find the extreme pixels in the mask to find the bounding box coordinates around this mask? I am using pytorch and yolact edge to perform the segmentation as shown in Yolact

Akshay Acharya
  • 253
  • 4
  • 13
  • is `(mask == 1)` what you are looking for? – Minh-Long Luu Jul 07 '21 at 09:59
  • @AerysS I am trying ot use torch.where but the issue is, How do I find the extreme left , right, top and bottom pixels of the mask (basically xmin, ymin, xmax, ymax). This returns all the ixels where the value is 1 right? How do i then find the extremes? – Akshay Acharya Jul 07 '21 at 10:01
  • If I understand the question correctly then there will never always be exactly one "extreme" pixel in any direction. Imagine an arc drawn with its center as one of the image's corner, all points on this arc will be equidistance from the corner, and thus all "extreme". The best you can do is to `np.where(pixels == 1)` and then `np.min`/`np.max` over the `[0:,]` and `[1:,]` dimensions of `where` to find this possible set of "extremes" in each direction. Then figure out the most meaningful way to select one for you problem instance. – KDecker Jul 07 '21 at 15:49
  • I'd image your drawing a bounding box around the mask, so selection of the possible extremes should be easy because you're either drawing a vertical or horizontal line at each "extreme" point to enclose the entire mask. Thus any point from the possible should work to provide the vertical or horizontal location of the line for the given edge of the bounding box. – KDecker Jul 07 '21 at 15:51

1 Answers1

1

Relevant stackoverflow answer with nice explanation.

TL;DR Proposed code snippets (second is faster):

def bbox1(img):
    a = np.where(img != 0)
    bbox = np.min(a[0]), np.max(a[0]), np.min(a[1]), np.max(a[1])
    return bbox

def bbox2(img):
    rows = np.any(img, axis=1)
    cols = np.any(img, axis=0)
    rmin, rmax = np.where(rows)[0][[0, -1]]
    cmin, cmax = np.where(cols)[0][[0, -1]]

    return rmin, rmax, cmin, cmax

But in more general case (e.g. if you have more than one "instance" on image and each mask is separated from others) it may be worth to consider using OpenCV. Specifically cv2.connectedComponentsWithStats.
Some brilliant description of this function can be found in another relevant answer.

num_labels, labels, stats, centroids = cv2.connectedComponentsWithStats(mask)

Labels is a matrix the size of the input image where each element has a value equal to its label.

Stats is a matrix of the stats that the function calculates. It has a length equal to the number of labels and a width equal to the number of stats. It can be used with the OpenCV documentation for it:

Statistics output for each label, including the background label, see below for available statistics. Statistics are accessed via stats[label, COLUMN] where available columns are defined below.

  • cv2.CC_STAT_LEFT The leftmost (x) coordinate which is the inclusive start of the bounding box in the horizontal direction.
  • cv2.CC_STAT_TOP The topmost (y) coordinate which is the inclusive start of the bounding box in the vertical direction.
  • cv2.CC_STAT_WIDTH The horizontal size of the bounding box
  • cv2.CC_STAT_HEIGHT The vertical size of the bounding box
  • cv2.CC_STAT_AREA The total area (in pixels) of the connected component

Centroids is a matrix with the x and y locations of each centroid. The row in this matrix corresponds to the label number.

So, basically each item in stats (first 4 values) determine the bounding box of each connected component (instance) in mask.

Possible function that you can use to return just bounding boxes:

def get_bounding_boxes(mask, min_size=None):
    num_components, labeled_image, bboxes, centroids = cv2.connectedComponentsWithStats(image)
    # return bboxes in cv2 format [x, y, w, h] without background bbox and component size
    return bboxes[1:, :-1]  
# (x, y, x+w, y+h) are 4 points that you are looking for

And of course in case of one instance this approach still works.