107

I have an array that might look like this:

ANOVAInputMatrixValuesArray = [[ 0.96488889, 0.73641667, 0.67521429, 0.592875, 
0.53172222], [ 0.78008333, 0.5938125, 0.481, 0.39883333, 0.]]

Notice that one of the rows has a zero value at the end. I want to delete any row that contains a zero, while keeping any row that contains non-zero values in all cells.

But the array will have different numbers of rows every time it is populated, and the zeros will be located in different rows each time.

I get the number of non-zero elements in each row with the following line of code:

NumNonzeroElementsInRows    = (ANOVAInputMatrixValuesArray != 0).sum(1)

For the array above, NumNonzeroElementsInRows contains: [5 4]

The five indicates that all possible values in row 0 are nonzero, while the four indicates that one of the possible values in row 1 is a zero.

Therefore, I am trying to use the following lines of code to find and delete rows that contain zero values.

for q in range(len(NumNonzeroElementsInRows)):
    if NumNonzeroElementsInRows[q] < NumNonzeroElementsInRows.max():
        p.delete(ANOVAInputMatrixValuesArray, q, axis=0)

But for some reason, this code does not seem to do anything, even though doing a lot of print commands indicates that all of the variables seem to be populating correctly leading up to the code.

There must be some easy way to simply "delete any row that contains a zero value."

Can anyone show me what code to write to accomplish this?

Unihedron
  • 10,902
  • 13
  • 62
  • 72
MedicalMath
  • 1,071
  • 2
  • 7
  • 3

6 Answers6

207

The simplest way to delete rows and columns from arrays is the numpy.delete method.

Suppose I have the following array x:

x = array([[1,2,3],
        [4,5,6],
        [7,8,9]])

To delete the first row, do this:

x = numpy.delete(x, (0), axis=0)

To delete the third column, do this:

x = numpy.delete(x,(2), axis=1)

So you could find the indices of the rows which have a 0 in them, put them in a list or a tuple and pass this as the second argument of the function.

MERose
  • 4,048
  • 7
  • 53
  • 79
Jaidev Deshpande
  • 3,016
  • 1
  • 16
  • 17
  • 2
    Thanks! I had the same problem, and I could not figure out why simply calling `numpy.delete(x, index)` didn't work. – Antimony Nov 20 '15 at 22:59
  • 7
    note that the [numpy delete() docs](https://docs.scipy.org/doc/numpy/reference/generated/numpy.delete.html) indicate that "Often it is preferable to use a boolean mask" since a new array is returned - an example is provided under that link – arturomp Oct 27 '16 at 03:42
  • 3
    @arturomp but the mask is nondestructive. Is a call to delete() time/memory consuming? – Nathan majicvr.com Mar 28 '18 at 18:19
14

Here's a one liner (yes, it is similar to user333700's, but a little more straightforward):

>>> import numpy as np
>>> arr = np.array([[ 0.96488889, 0.73641667, 0.67521429, 0.592875, 0.53172222], 
                [ 0.78008333, 0.5938125, 0.481, 0.39883333, 0.]])
>>> print arr[arr.all(1)]
array([[ 0.96488889,  0.73641667,  0.67521429,  0.592875  ,  0.53172222]])

By the way, this method is much, much faster than the masked array method for large matrices. For a 2048 x 5 matrix, this method is about 1000x faster.

By the way, user333700's method (from his comment) was slightly faster in my tests, though it boggles my mind why.

Justin Peel
  • 46,722
  • 6
  • 58
  • 80
  • 3
    "any" can short-circuit, as soon as the first true case is detected, it can stop, while "all" has to check all conditions. So, not ("~" in numpy) any, should in general be faster than all. – Josef Oct 11 '10 at 02:23
  • 5
    @user333700, both of them can short-circuit, just to different things. `any` short-circuits to true at the first true case detected; `all` short-circuits to false at the first false case detected. In this case, the short-circuiting should be a draw, but doing the extra not should make it slower in my opinion. – Justin Peel Oct 11 '10 at 02:28
5

This is similar to your original approach, and will use less space than unutbu's answer, but I suspect it will be slower.

>>> import numpy as np
>>> p = np.array([[1.5, 0], [1.4,1.5], [1.6, 0], [1.7, 1.8]])
>>> p
array([[ 1.5,  0. ],
       [ 1.4,  1.5],
       [ 1.6,  0. ],
       [ 1.7,  1.8]])
>>> nz = (p == 0).sum(1)
>>> q = p[nz == 0, :]
>>> q
array([[ 1.4,  1.5],
       [ 1.7,  1.8]])

By the way, your line p.delete() doesn't work for me - ndarrays don't have a .delete attribute.

Community
  • 1
  • 1
mtrw
  • 34,200
  • 7
  • 63
  • 71
2

numpy provides a simple function to do the exact same thing: supposing you have a masked array 'a', calling numpy.ma.compress_rows(a) will delete the rows containing a masked value. I guess this is much faster this way...

jeps
  • 21
  • 1
1
import numpy as np 
arr = np.array([[ 0.96488889, 0.73641667, 0.67521429, 0.592875, 0.53172222],[ 0.78008333, 0.5938125, 0.481, 0.39883333, 0.]])
print(arr[np.where(arr != 0.)])
Prokhozhii
  • 622
  • 1
  • 8
  • 12
-1

I might be too late to answer this question, but wanted to share my input for the benefit of the community. For this example, let me call your matrix 'ANOVA', and I am assuming you're just trying to remove rows from this matrix with 0's only in the 5th column.

indx = []
for i in range(len(ANOVA)):
    if int(ANOVA[i,4]) == int(0):
        indx.append(i)

ANOVA = [x for x in ANOVA if not x in indx]
troymyname00
  • 670
  • 1
  • 14
  • 32