2

I am trying to write some code that uses logical numpy arrays to index a larger array, similar to how MATLAB allows array indexing with logical arrays.

import numpy as np
m = 4
n = 4
unCov = np.random.randint(10, size = (m,n) )
rowCov = np.zeros( m, dtype = bool )
colCov = np.ones( n, dtype = bool )
>>> unCov[rowCov, rowCov] 
[]  # as expected
>>> unCov[colCov, colCov]
[0 8 3 3]  # diagonal values of unCov, as expected
>>> unCov[rowCov, colCov]
ValueError: shape mismatch: objects cannot be broadcast to a single shape

For this last evaluation, I expected an empty array, similar to what MATLAB returns. I'd rather not have to check rowCov/colCov for True elements prior to indexing. Why is this happening, and is there a better way to do this?

gariepy
  • 3,576
  • 6
  • 21
  • 34
  • 2
    Umm...matlab returns the full matrix in the latter case, not the diagonal values. Isn't that right? – Andras Deak -- Слава Україні Dec 15 '15 at 18:14
  • @Andras Oh, good point, you are correct! Sorry, I was not focusing on that aspect of the MATLAB comparison. I guess numpy and MATLAB are more different than I realized in this aspect. – gariepy Dec 15 '15 at 19:43
  • 1
    Take a look at this [similar question](http://stackoverflow.com/questions/22357622/logical-indexing-in-numpy-with-two-indices-as-in-matlab/22366062#22366062) I think it answers your question. – Bi Rico Dec 15 '15 at 19:59
  • @Bi Rico: Yes, thank you! That works just how I want it to! – gariepy Dec 15 '15 at 20:09

1 Answers1

5

As I understand it, numpy will translate your 2d logical indices to actual index vectors: arr[[True,False],[False,True]] would become arr[0,1] for an ndarray of shape (2,2). However, in your last case the second index array is full False, hence it corresponds to an index array of length 0. This is paired with the other full True index vector, corresponding to an index array of length 4.

From the numpy manual:

If the index arrays do not have the same shape, there is an attempt to broadcast them to the same shape. If they cannot be broadcast to the same shape, an exception is raised:

In your case, the error is exactly due to this:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-1411-28e41e233472> in <module>()
----> 1 unCov[colCov,rowCov]

IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (4,) (0,)

MATLAB, on the other hand, automatically returns an empty array if the index array is empty along any given dimension.


This actually highlights a fundamental difference between the logical indexing in MATLAB and numpy. In MATLAB, vectors in subscript indexing always slice out a subarray. That is, both

arr([1,2],[1,2])

and

arr([true,true],[true,true])

will return the 2 x 2 submatrix of the matrix arr. If the logical index vectors are shorter than the given dimension of the array, the missing indexing elements are assumed to be false. Fun fact: the index vector can also be longer than the given dimension, as long as the excess elements are all false. So the above is also equivalent to

arr([true,true,false,false],[true,true])

and

arr([true,true,false,false,false,false,false],[true,true])

for a 4 x 4 array (for the sake of argument).

In numpy, however, indexing with boolean-valued numpy arrays in this way will try to extract a vector. Furthermore, the boolean index vectors should be the same length as the dimension they are indexing into. In your 4 x 4 example,

unCov[np.array([True,True]),np.array([True,True])]

and

unCov[np.array([True,True,False,False,False]),np.array([True,True,False,False,False])]

both return the two first diagonal elements, so not a submatrix but rather a vector. Furthermore, they also give the less-then-encouraging warning along the lines of

/usr/bin/ipython:1: VisibleDeprecationWarning: boolean index did not match indexed array along dimension 0; dimension is 4 but corresponding boolean dimension is 5

So, in numpy, your logical indexing vectors should be the same length as the corresponding dimensions of the ndarray. And then what I wrote above holds true: the logical values are translated into indices, and the result is expected to be a vector. The length of this vector is the number of True elements in every index vector, so if your boolean index vectors have a different number of True elements, then the referencing doesn't make sense, and you get the error that you get.

  • 1
    I was basically going to write this. +1. – rayryeng Dec 15 '15 at 18:23
  • I appreciate the explanation...very good info! It seems slightly non-Pythonic that you cannot perform general indexing like unCov[rowCov, colCov] without first checkiing if one of the arrays is all False. Bummer. – gariepy Dec 15 '15 at 19:47