Alternative to loop for for boolean / nonzero indexing of numpy array

Question

I need to select only the non-zero 3d portions of a 3d binary array (or alternatively the true values of a boolean array). Currently I am able to do so with a series of 'for' loops that use np.any, but this does work but seems awkward and slow, so currently investigating a more direct way to accomplish the task.

I am rather new to numpy, so the approaches that I have tried include a) using np.nonzero, which returns indices that I am at a loss to understand what to do with for my purposes, b) boolean array indexing, and c) boolean masks. I can generally understand each of those approaches for simple 2d arrays, but am struggling to understand the differences between the approaches, and cannot get them to return the right values for a 3d array.

Here is my current function that returns a 3D array with nonzero values:

def real_size(arr3):
    true_0 = []
    true_1 = []
    true_2 = []
    print(f'The input array shape is: {arr3.shape}')

    for zero_ in range (0, arr3.shape[0]):
        if arr3[zero_].any()==True:
            true_0.append(zero_)
    for one_ in range (0, arr3.shape[1]):
        if arr3[:,one_,:].any()==True:
            true_1.append(one_)
    for two_ in range (0, arr3.shape[2]):
        if arr3[:,:,two_].any()==True:
            true_2.append(two_)

    arr4 = arr3[min(true_0):max(true_0) + 1, min(true_1):max(true_1) + 1, min(true_2):max(true_2) + 1]
    print(f'The nonzero area is: {arr4.shape}')
    return arr4

# Then use it on a small test array:
test_array = np.zeros([2, 3, 4], dtype = int)
test_array[0:2, 0:2, 0:2] = 1

#The function call works and prints out as expected:
non_zero = real_size(test_array)
>> The input array shape is: (2, 3, 4) 
>> The nonzero area is: (2, 2, 2)

# So, the array is correct, but likely not the best way to get there:
non_zero

>> array([[[1, 1],
        [1, 1]],

       [[1, 1],
        [1, 1]]])

The code works appropriately, but I am using this on much larger and more complex arrays, and don't think this is an appropriate approach. Any thoughts on a more direct method to make this work would be greatly appreciated. I am also concerned about errors and the results if the input array has for example two separate non-zero 3d areas within the original array.

To clarify the problem, I need to return one or more 3D portions as one or more 3d arrays beginning with an original larger array. The returned arrays should not include extraneous zeros (or false values) in any given exterior plane in three dimensional space. Just getting the indices of the nonzero values (or vice versa) doesn't by itself solve the problem.

What happens if the ***1*** s are scattered randomly in your original array? — CristiFati, Aug 31 '19 at 16:59
I'm not sure *why* you need to do this, so I'm just guessing, but if you're trying to reduce the speed and memory of operations on a sparse array, then you can look into using some sparse data structures such as offered by scipy.sparse. — Andrew, Aug 31 '19 at 17:05
The application is to reduce down an array for a 3d printing application. So to rotate or translate the shape within the field, I only need to manipulate the positive voxel areas. — Professor Rumble Pony, Aug 31 '19 at 20:15

jhansen · Accepted Answer · 2019-08-31T19:36:45.220

Assuming you want to eliminate all rows, columns, etc. that contain only zeros, you could do the following:

nz = (test_array != 0)
non_zero = test_array[nz.any(axis=(1, 2))][:, nz.any(axis=(0, 2))][:, :, nz.any(axis=(0, 1))]

An alternative solution using np.nonzero:

i = [np.unique(_) for _ in np.nonzero(test_array)]
non_zero = test_array[i[0]][:, i[1]][:, :, i[2]]

This can also be generalized to arbitrary dimensions, but requires a bit more work (only showing the first approach here):

def real_size(arr):
    nz = (arr != 0)
    result = arr
    axes = np.arange(arr.ndim)
    for axis in range(arr.ndim):
        zeros = nz.any(axis=tuple(np.delete(axes, axis)))
        result = result[(slice(None),)*axis + (zeros,)]
    return result

non_zero = real_size(test_array)

Alternative to loop for for boolean / nonzero indexing of numpy array

Here is my current function that returns a 3D array with nonzero values:

1 Answers1