2

I have a numpy array of 0 and 1. I need to extract in one pythonic move all the rows that are made up of all 0, and keep the rest.

I have looked for previous questions answering this and it appears that this question is a duplicate of this: Remove all-zero rows in a 2D matrix

But I do not understand any of the answers. It looks like the important command is this:

a[~(a==0).all(1)]

but I do not understand how does it extracts a matrix at all. In fact when I use this line in my code it extracts an array, not a 2d matrix.

I have looked at the np.all() explanation, but it looks like it is just a test.

Can someone please help me out.

Baum mit Augen
  • 49,044
  • 25
  • 144
  • 182
Pietro Speroni
  • 3,131
  • 11
  • 44
  • 55
  • Start with the line of code and example. and rather than doing it in one line, print out each step separately. Look at the result of each step and figure out what's happening from one step to the next. That's all someone here will do, and you'll understand it better if you do it yourself. – tom10 Feb 27 '16 at 17:42
  • There is no code in the answer in that specific question. And the lines that are presented if I run them as code give an error. – Pietro Speroni Feb 27 '16 at 17:48
  • `a[~(a==0).all(1)]` is the line of code I was referring to. – tom10 Feb 27 '16 at 17:49
  • When I run that line inside my code it makes me an array. As it should, as I would predict. It does NOT extract a 2D submatrix. – Pietro Speroni Feb 27 '16 at 17:51

2 Answers2

10

The main problem associated with the line of code a[~(a==0).all(1)] is that it works for a numpy.array and it seems that you are using a numpy.matrix, for which the code doesn't quite work. If a is a numpy.matrix, use instead a[~(a==0).all(1).A1].

Since you're new to numpy, I'll point out that complex single lines of code can be better understood by breaking them down into single steps and printing the intermediate results. This is usually the first step of debugging. I'll do this for the line a[~(a==0).all(1)] for both numpy.array and numpy.matrix.

For a numpy.array:

In [1]: from numpy import *

In [2]: a = array([[4, 1, 1, 2, 0, 4],
                   [3, 4, 3, 1, 4, 4],
                   [1, 4, 3, 1, 0, 0],
                   [0, 4, 4, 0, 4, 3],
                   [0, 0, 0, 0, 0, 0]])

In [3]: print a==0
[[False False False False  True False]
 [False False False False False False]
 [False False False False  True  True]
 [ True False False  True False False]
 [ True  True  True  True  True  True]]

In [6]: print (a==0).all(1)
[False False False False  True]

In [7]: print ~(a==0).all(1)
[ True  True  True  True False]

In [8]: print a[~(a==0).all(1)]
[[4 1 1 2 0 4]
 [3 4 3 1 4 4]
 [1 4 3 1 0 0]
 [0 4 4 0 4 3]]

For a numpy.matrix:

In [1]: from numpy import *

In [2]: a = matrix([[4, 1, 1, 2, 0, 4],
                    [3, 4, 3, 1, 4, 4],
                    [1, 4, 3, 1, 0, 0],
                    [0, 4, 4, 0, 4, 3],
                    [0, 0, 0, 0, 0, 0]])

In [3]: print a==0
[[False False False False  True False]
 [False False False False False False]
 [False False False False  True  True]
 [ True False False  True False False]
 [ True  True  True  True  True  True]]


In [5]: print (a==0).all(1)
[[False]
 [False]
 [False]
 [False]
 [ True]]

In [6]: print (a==0).all(1).A1
[False False False False  True]

In [7]: print ~(a==0).all(1).A1
[ True  True  True  True False]

In [8]: print a[~(a==0).all(1).A1]
[[4 1 1 2 0 4]
 [3 4 3 1 4 4]
 [1 4 3 1 0 0]
 [0 4 4 0 4 3]]

The output of In[5] shows why this isn't working: (a==0).all(1) produces a 2D result which can't be used to index the rows. Therefore I just tacked on .A1 in the next line to convert it to 1D.

Here is a good answer on the difference between the arrays and matrices. Also to this I'll add that once the infix operator is fully adopted, there will be almost no advantage to using numpy.matrix. Also, because most people use numpy.arrays to represent matrices in their code, they will often describe a numpy.array as a "matrix", thus creating confusion in the terminology.

Finally, as an aside I'll note that all of the above was done in ipython from the command line. IPython is an excellent tool for this type of work.

Community
  • 1
  • 1
tom10
  • 67,082
  • 10
  • 127
  • 137
  • Sorry, what are those In [#]: is this a debugging convention I am not aware of? – Pietro Speroni Feb 27 '16 at 18:31
  • This is a direct cut-and-paste from an ipython console session where `In[#]: ` is the prompt. Basically, you can just ignore it (though I would recommend looking into ipython for this type of work). – tom10 Feb 27 '16 at 18:39
  • @tom10 does this code work the same way when you define matrix instead of array? In Python 3 I get matrix([[4, 3, 1, 0]]) as result if I define a as a matrix – Mantxu Feb 27 '16 at 19:10
  • An `np.matrix` is always 2d; unless you are confirmed MATLAB user I'd suggest avoiding this array subclass. – hpaulj Feb 27 '16 at 21:12
  • 2
    @PietroSperoni and @Mantxu: if `a` is an `np.matrix`, then the command should be `a[~(a==0).all(1).A1]`. But, as @hpaulj said, it's better to use `np.array` instead of `np.matrix`. (In retrospect, I see that the title says "matrix", but with numpy, usually by this people still mean `np.array`, so it's unclear which form the question is actually about.) – tom10 Feb 28 '16 at 02:31
  • Sorry, I am quite ignorant of numpy, so when I said matrix I actually meant matrix :-D – Pietro Speroni Feb 28 '16 at 10:30
  • @PietroSperoni: I think you're right to use "matrix", but it can be misunderstood for "array". For example, in the SO post you link to in your question, they use "matrix" but are probably referring to an `np.array`. Regardless, I'll edit my answer for an `np.matrix`. In general though, unless you have a compelling reason, I'd recommend using `np.array` instead. – tom10 Feb 28 '16 at 14:50
  • that would explain why their answer does not work on my code. Thanks! – Pietro Speroni Feb 28 '16 at 15:27
  • 1
    @PietroSperoni: I edited my answer to give a complete explanation. Probably excessive, but I hope it's helpful. – tom10 Feb 28 '16 at 16:19
1

This is a working example, maybe not the most efficient:

import numpy as np
m=np.matrix([[1,2,3],[0,0,0], [4,5,6]])
m_nonzero_rows = m[[i for i, x in enumerate(m) if x.any()]]

In here you extract the rows with the index number in the list. You create that list with the index numbers of the rows that satisfy x.any(), which as far as I know gives "False" if every value in the row is 0.

Mantxu
  • 319
  • 2
  • 11
  • Thanks, let me test it – Pietro Speroni Feb 27 '16 at 18:18
  • Oh! Now this works. I still need to understand how, but at least I have something to work on. – Pietro Speroni Feb 27 '16 at 18:21
  • 1
    I just added the definition of m just in case we are not talking about the same thing. I am not a numpy expert, but I have noticed differences between matrix and 2d arrays. Correct me if I am wrong please. I added some explanations as well. – Mantxu Feb 27 '16 at 18:21
  • Thanks. I have to admit I never so this use of the double square brackets. Nor the FOR on two different variables. So much to learn! – Pietro Speroni Feb 27 '16 at 18:29
  • 1
    @PietroSperoni: These one liners are tempting but hard to parse if you're not familiar with the notations. So you can look this up, I'll point out that it's not really "double square brackets". It's an inner pair `[i for i...]` which is a "list comprehension", a pure python, non-numpy statement, within square brackets used to index `m`. – tom10 Feb 27 '16 at 19:04
  • 1
    Please note that there double square bracket comes from having a list in the brackets for access by index m[list_of_index]. As you wanted the process in one shot, here we have the list created in the same line. Note as well that the for loops through all the items in enumerate(m), that are extracted in to i and x, where i is the index and x is the content that m has in that index (same value as when you do m[i]). – Mantxu Feb 27 '16 at 19:06