22

I am dealing with arrays created via numpy.array(), and I need to draw points on a canvas simulating an image. Since there is a lot of zero values around the central part of the array which contains the meaningful data, I would like to "trim" the array, erasing columns that only contain zeros and rows that only contain zeros.

So, I would like to know of some native numpy function or even a code snippet to "trim" or find a "bounding box" to slice only the data-containing part of the array.

(since it is a conceptual question, I did not put any code, sorry if I should, I'm very fresh to posting at SO.)

Thanks for reading

Saullo G. P. Castro
  • 56,802
  • 26
  • 179
  • 234
heltonbiker
  • 26,657
  • 28
  • 137
  • 252
  • http://stackoverflow.com/questions/31400769/bounding-box-of-numpy-array see bbox2 function... MUCH faster, if there are many rows / columns entirely filled with zeros and only a small amount of clustered data. – Benjamin Oct 20 '16 at 17:35

3 Answers3

24

This should do it:

from numpy import array, argwhere

A = array([[0, 0, 0, 0, 0, 0, 0],
           [0, 0, 0, 0, 0, 0, 0],
           [0, 0, 1, 0, 0, 0, 0],
           [0, 0, 1, 1, 0, 0, 0],
           [0, 0, 0, 0, 1, 0, 0],
           [0, 0, 0, 0, 0, 0, 0],
           [0, 0, 0, 0, 0, 0, 0]])

B = argwhere(A)
(ystart, xstart), (ystop, xstop) = B.min(0), B.max(0) + 1 
Atrim = A[ystart:ystop, xstart:xstop]
Paul
  • 42,322
  • 15
  • 106
  • 123
  • 1
    Nice! Just on a readability note, you could do `(ystart, xstart), (ystop, xstop) = B.min(0), B.max(0) + 1` and then simply index `A` with `Atrim = a[ystart:ystop, xstart:xstop]`. Of course, it's entirely equivalent, but I find it more readable, at any rate. – Joe Kington Jan 26 '11 at 19:51
  • This one was fine, the example you used is exactely the typical array I would be using (just larger). I didn't know the function argwhere, will do my homework now. Thanks! – heltonbiker Jan 27 '11 at 17:41
  • @Paul ..Thanks, you help me :-) – Necromancer Jun 14 '16 at 05:32
  • is there a way to do it for any array dimension ? – Naomi Fridman Aug 24 '17 at 11:40
  • 1
    @Naomi Sure. Just extend the patterns in this example by adding a `zstart` after the `ystart` and `xstart` for 3 dims and keep adding more for higher dimensions. – Paul Aug 24 '17 at 13:17
17

The code below, from this answer runs fastest in my tests:

def bbox2(img):
    rows = np.any(img, axis=1)
    cols = np.any(img, axis=0)
    ymin, ymax = np.where(rows)[0][[0, -1]]
    xmin, xmax = np.where(cols)[0][[0, -1]]
    return img[ymin:ymax+1, xmin:xmax+1]

The accepted answer using argwhere worked but ran slower. My guess is, it's because argwhere allocates a giant output array of indices. I tested on a large 2D array (a 1024 x 1024 image, with roughly a 50x100 nonzero region).

heltonbiker
  • 26,657
  • 28
  • 137
  • 252
Luke
  • 5,329
  • 2
  • 29
  • 34
  • I found this answer way more pythonic! Thanks! – heltonbiker Jun 24 '17 at 15:03
  • 1
    Caution, this code may generate an error in the edge case of a completely black image. You must verify that neither of the two `np.where()` calls returns an empty array. – Delgan Oct 06 '17 at 17:34
  • This is great! Any idea on how to extend it with periodic boundary conditions? – Tropilio Dec 11 '19 at 15:28
  • @Tropilio I'm not sure I understand what you mean by periodic boundary conditions. But if you're looking to find multiple "blobs" of contiguous True values, an approach like this answer probably won't work. Instead, to find arbitrary blobs, I'd use the OpenCV connectedComponents() function: https://docs.opencv.org/3.4/d3/dc0/group__imgproc__shape.html#gaedef8c7340499ca391d459122e51bef5 – Luke Dec 31 '19 at 05:38
0

Something like:

empty_cols = sp.all(array == 0, axis=0)
empty_rows = sp.all(array == 0, axis=1)

The resulting arrays will be 1D boolian arrays. Loop on them from both ends to find the 'bounding box'.

kiyo
  • 1,929
  • 1
  • 18
  • 22
  • 1
    looping over numpy arrays should be avoided – Paul Jan 26 '11 at 19:32
  • The loop is only 1D, so order n, not n^2. Not that big of a deal. – kiyo Jan 27 '11 at 13:13
  • 1
    You are right about the order and you don't even require a loop over the entire array width, but the python loop contains all kinds of extra steps like type-checking. In this 1D example: http://www.scipy.org/Getting_Started#head-9aed725bd569d40f625240b2b6ec710550ff14b9 The python loop runs 25X slower to accomplish the same task! Without knowing the size or quantity of the images or the application of the algorithm (computer vision?), I can't say how big a deal that kind of speedup is. – Paul Jan 27 '11 at 14:41