2

This is essentially the 2D array equivalent of slicing a python list into smaller lists at indexes that store a particular value. I'm running a program that extracts a large amount of data out of a CSV file and copies it into a 2D NumPy array. The basic format of these arrays are something like this:

[[0 8 9 10]
[9 9 1 4]
[0 0 0 0]
[1 2 1 4]
[0 0 0 0]
[1 1 1 2]
[39 23 10 1]]

I want to separate my NumPy array along rows that contain all zero values to create a set of smaller 2D arrays. The successful result for the above starting array would be the arrays:

[[0 8 9 10]
[9 9 1 4]]

[[1 2 1 4]]

[[1 1 1 2]
[39 23 10 1]]

I've thought about simply iterating down the array and checking if the row has all zeros but the data I'm handling is substantially large. I have potentially millions of rows of data in the text file and I'm trying to find the most efficient approach as opposed to a loop that could waste computation time. What are your thoughts on what I should do? Is there a better way?

Ehsan
  • 12,072
  • 2
  • 20
  • 33
M3NT0
  • 53
  • 5

2 Answers2

2

a is your array. You can use any to find all zero rows, remove them, and then use split to split by their indices:

#not_all_zero rows indices
idx = np.flatnonzero(a.any(1))
#all_zero rows indices
idx_zero = np.delete(np.arange(a.shape[0]),idx)
#select not_all_zero rows and split by all_zero row indices
output = np.split(a[idx],idx_zero-np.arange(idx_zero.size))

output:

[array([[ 0,  8,  9, 10],
       [ 9,  9,  1,  4]]), 
 
 array([[1, 2, 1, 4]]), 
 
 array([[ 1,  1,  1,  2],
       [39, 23, 10,  1]])]
Ehsan
  • 12,072
  • 2
  • 20
  • 33
1

You can use the np.all function to check for rows which are all zeros, and then index appropriately.

# assume `x` is your data
indices = np.all(x == 0, axis=1)
zeros = x[indices]
nonzeros = x[np.logical_not(indices)]

The all function accepts an axis argument (as do many NumPy functions), which indicates the axis along which to operate. 1 here means to do the reduction along rows, so you get back a boolean array of shape (x.shape[0],), which can be used to directly index x.

Note that this will be much faster than a for-loop over the rows, especially for large arrays.

bnaecker
  • 6,152
  • 1
  • 20
  • 33