5

I have a question: how to get a sub matrix like a sub array by boolean slicing?

For example:

    a2 = np.array(np.arange(30).reshape(5, 6))
    a2[a2[:, 1] > 10]

will give me:

    array([[12, 13, 14, 15, 16, 17],
           [18, 19, 20, 21, 22, 23],
           [24, 25, 26, 27, 28, 29]])

but:

    m2 = np.mat(np.arange(30).reshape(5, 6))
    m2[m2[:, 1] > 10]

will give me:

    matrix([[12, 18, 24]])

Why the output is different and How can I get the same result as array from matrix?

Thank you!

Saullo G. P. Castro
  • 56,802
  • 26
  • 179
  • 234
pinseng
  • 301
  • 2
  • 6
  • 11

2 Answers2

4

The issue you're experiencing comes down to the fact that operations on a matrix return always return a 2-dimensional array.

When you build the mask on the first array, you get:

In [24]: a2[:,1] > 10
Out[24]: array([False, False,  True,  True,  True], dtype=bool)

which, as you can see, is a 1-dimensional array.

When you do the same thing with the matrix, you get:

In [25]: m2[:,1] > 10
Out[25]: 
matrix([[False],
        [False],
        [ True],
        [ True],
        [ True]], dtype=bool)

In other words, you have a nx1 array, not an array of length n.


Indexing in numpy operates differently depending on whether you're indexing with a one or n dimensional array.

In your first case, numpy will treat the array of length n as row indices, so you'll get the expected result:

In [28]: a2[a2[:,1] > 10]
Out[28]: 
array([[12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29]])

In the second case, because you have a 2-dimensional index array, numpy has enough information to extract both the row and the column, and so it only grabs things from the matching column (the first one):

In [29]: m2[m2[:,1] > 10]
Out[29]: matrix([[12, 18, 24]])

To answer your question: you can get this behaviour by converting your masks to an array and grabbing the first column, to extract your initial array of length n:

In [32]: m2[np.array(m2[:,1] > 10)[:,0]]
Out[32]: 
matrix([[12, 13, 14, 15, 16, 17],
        [18, 19, 20, 21, 22, 23],
        [24, 25, 26, 27, 28, 29]])

Alternatively, you could do the conversion first, getting the same result as before:

In [34]: np.array(m2)[:,1] > 10
Out[34]: array([False, False,  True,  True,  True], dtype=bool)

Now, both of those fixes require conversions between matrices and arrays, which can be pretty ugly.

The question I'd be asking yourself is why you wish to use a matrix, and yet expect the behaviour of an array. It could be that the right tool for your job is actually an array, not a matrix.

sapi
  • 9,944
  • 8
  • 41
  • 71
  • Thank you, Sapi.I thought array and matrix were equivalent in numpy, but they are not. – pinseng Sep 18 '14 at 16:19
  • @pinseng As a general rule, it's best to stick to array unless you actually need to do matrix math (in which case the convenience of defined matrix operations can outweigh the problems seen here) – sapi Sep 18 '14 at 22:04
  • Hi Sapi, may I ask another question: why `np.array(2)` and `np.array([2])` gives different results but `np.mat(2)` and `np.mat([2])` gives the same results? – pinseng Dec 05 '14 at 19:35
1

If you flatten the boolean mask like:

m2[np.asarray(m2[:,1]>10).flatten()]

you get the same result, but I would recommend using np.array instead of np.matrix for the reasons given in this answer.

Community
  • 1
  • 1
Saullo G. P. Castro
  • 56,802
  • 26
  • 179
  • 234