Computing Bounding Boxes from a Mask-Image (Tensorflow or other)

Question

I'm looking for ways to convert a mask (a Height x Width boolean image) into a series of bounding boxes (see example picture below, which I hand-drew), with boxes encircling the "islands of truth".

Specifically, I'm looking for a way that would work with standard TensorFlow ops (though all input is welcome). I want this so I can convert the model to TFLite without adding custom ops and recompiling from source. But in general it would just be nice to be aware of different ways of doing this.

Notes:

I already have a solution involving non-standard Tensorflow, based on tfa.image.connected_components (see solution here). However that op is not included in Tensorflow Lite. It also feels like it does something slightly harder than necessary (finding connected components feels harder than just outlining blobs on an image without worrying about whether they are connected or not)
I know I haven't specified here exactly how I'd like the boxes generated (e.g whether separate "ying-yang-style" connected components should have separate boxes even if they overlap, etc). Really I'm not worried about the details, just that the resulting boxes look "reasonable".
Some related questions (please read before flagging as duplicate!):
- Converting a binary mask into a bounding box in tensorflow asks about creating a single bounding box, which is significantly easier.
- Generating bounding boxes from heatmap data (similar, but asks the slightly broader question of converting from "heatmap", and does not specify Tensorflow).
- Create Bounding Boxes from Image Labels assumes the image has already been segmented into components (called "labels" there)
I'm ideally looking for something that does not need training (e.g. YOLO-style regression) and just works out of the box (heh).

Edit Here is an example mask image: https://github.com/petered/data/blob/master/images/example_mask3.png which can be loaded into a mask with

mask = cv2.imread(os.path.expanduser('~/Downloads/example_mask3.png')).mean(axis=2) > 50

Some pointer: https://github.com/keras-team/keras-cv/pull/193 — Innat, Aug 08 '22 at 20:43
Thanks - but doesn't really answer it, as looking at the code, it just generates one box per mask. In the above image it would just generate one box that encompasses all blobs. — Peter, Aug 08 '22 at 22:04

Peter · Answer 1 · 2022-08-10T00:29:03.967

Well, not sure if this is doable with just tensorflow ops, but here is a Python/Numpy implementation (which uses a very inefficient double-for loop). In principle, it should be fast if vectorized (again, not sure if possible) or written in C, because it just does 2 passes over the pixels to compute the boxes.

I'm not sure if this algorithm has an existing name, but if not I would call it Downright Boxing because it involves extending the mask-segments down and to the right in order to find boxes.

Here's the result on the mask in the question (with a few extra shapes added as examples):


def mask_to_boxes(mask: Array['H,W', bool]) -> Array['N,4', int]:
    """ Convert a boolean (Height x Width) mask into a (N x 4) array of NON-OVERLAPPING bounding boxes
    surrounding "islands of truth" in the mask.  Boxes indicate the (Left, Top, Right, Bottom) bounds
    of each island, with Right and Bottom being NON-INCLUSIVE (ie they point to the indices AFTER the island).

    This algorithm (Downright Boxing) does not necessarily put separate connected components into
    separate boxes.

    You can "cut out" the island-masks with
        boxes = mask_to_boxes(mask)
        island_masks = [mask[t:b, l:r] for l, t, r, b in boxes]
    """
    max_ix = max(s+1 for s in mask.shape)   # Use this to represent background
    # These arrays will be used to carry the "box start" indices down and to the right.
    x_ixs = np.full(mask.shape, fill_value=max_ix)
    y_ixs = np.full(mask.shape, fill_value=max_ix)

    # Propagate the earliest x-index in each segment to the bottom-right corner of the segment
    for i in range(mask.shape[0]):
        x_fill_ix = max_ix
        for j in range(mask.shape[1]):
            above_cell_ix = x_ixs[i-1, j] if i>0 else max_ix
            still_active = mask[i, j] or ((x_fill_ix != max_ix) and (above_cell_ix != max_ix))
            x_fill_ix = min(x_fill_ix, j, above_cell_ix) if still_active else max_ix
            x_ixs[i, j] = x_fill_ix

    # Propagate the earliest y-index in each segment to the bottom-right corner of the segment
    for j in range(mask.shape[1]):
        y_fill_ix = max_ix
        for i in range(mask.shape[0]):
            left_cell_ix = y_ixs[i, j-1] if j>0 else max_ix
            still_active = mask[i, j] or ((y_fill_ix != max_ix) and (left_cell_ix != max_ix))
            y_fill_ix = min(y_fill_ix, i, left_cell_ix) if still_active else max_ix
            y_ixs[i, j] = y_fill_ix

    # Find the bottom-right corners of each segment
    new_xstops = np.diff((x_ixs != max_ix).astype(np.int32), axis=1, append=False)==-1
    new_ystops = np.diff((y_ixs != max_ix).astype(np.int32), axis=0, append=False)==-1
    corner_mask = new_xstops & new_ystops
    y_stops, x_stops = np.array(np.nonzero(corner_mask))

    # Extract the boxes, getting the top-right corners from the index arrays
    x_starts = x_ixs[y_stops, x_stops]
    y_starts = y_ixs[y_stops, x_stops]
    ltrb_boxes = np.hstack([x_starts[:, None], y_starts[:, None], x_stops[:, None]+1, y_stops[:, None]+1])
    return ltrb_boxes

Well here's a tensorflow version - but it uses a doubly nested `tf.scan` making it VERY slow (100x slower than pure python if using eager execution, 20x slower if compiled, 10x slower if using tflite). https://gist.github.com/petered/5bb97f751bb59475fdfaa60d771e4663 — Peter, Aug 10 '22 at 22:44

Computing Bounding Boxes from a Mask-Image (Tensorflow or other)

1 Answers1

Linked