144

I've been going crazy trying to figure out what stupid thing I'm doing wrong here.

I'm using NumPy, and I have specific row indices and specific column indices that I want to select from. Here's the gist of my problem:

import numpy as np

a = np.arange(20).reshape((5,4))
# array([[ 0,  1,  2,  3],
#        [ 4,  5,  6,  7],
#        [ 8,  9, 10, 11],
#        [12, 13, 14, 15],
#        [16, 17, 18, 19]])

# If I select certain rows, it works
print a[[0, 1, 3], :]
# array([[ 0,  1,  2,  3],
#        [ 4,  5,  6,  7],
#        [12, 13, 14, 15]])

# If I select certain rows and a single column, it works
print a[[0, 1, 3], 2]
# array([ 2,  6, 14])

# But if I select certain rows AND certain columns, it fails
print a[[0,1,3], [0,2]]
# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
# ValueError: shape mismatch: objects cannot be broadcast to a single shape

Why is this happening? Surely I should be able to select the 1st, 2nd, and 4th rows, and 1st and 3rd columns? The result I'm expecting is:

a[[0,1,3], [0,2]] => [[0,  2],
                      [4,  6],
                      [12, 14]]
smci
  • 32,567
  • 20
  • 113
  • 146
Mike C
  • 1,959
  • 2
  • 17
  • 17
  • Tagged [tag:numpy-slicing] to improve findability. (Also the terms 'slice' and 'slicing' do not occur in the plaintext, we could use some duplicates with those terms closed into this) – smci May 25 '18 at 10:24
  • This is a duplicate of https://stackoverflow.com/questions/19161512/numpy-extract-submatrix – David John Coleman II Dec 17 '18 at 05:57

4 Answers4

137

As Toan suggests, a simple hack would be to just select the rows first, and then select the columns over that.

>>> a[[0,1,3], :]            # Returns the rows you want
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [12, 13, 14, 15]])
>>> a[[0,1,3], :][:, [0,2]]  # Selects the columns you want as well
array([[ 0,  2],
       [ 4,  6],
       [12, 14]])

[Edit] The built-in method: np.ix_

I recently discovered that numpy gives you an in-built one-liner to doing exactly what @Jaime suggested, but without having to use broadcasting syntax (which suffers from lack of readability). From the docs:

Using ix_ one can quickly construct index arrays that will index the cross product. a[np.ix_([1,3],[2,5])] returns the array [[a[1,2] a[1,5]], [a[3,2] a[3,5]]].

So you use it like this:

>>> a = np.arange(20).reshape((5,4))
>>> a[np.ix_([0,1,3], [0,2])]
array([[ 0,  2],
       [ 4,  6],
       [12, 14]])

And the way it works is that it takes care of aligning arrays the way Jaime suggested, so that broadcasting happens properly:

>>> np.ix_([0,1,3], [0,2])
(array([[0],
        [1],
        [3]]), array([[0, 2]]))

Also, as MikeC says in a comment, np.ix_ has the advantage of returning a view, which my first (pre-edit) answer did not. This means you can now assign to the indexed array:

>>> a[np.ix_([0,1,3], [0,2])] = -1
>>> a    
array([[-1,  1, -1,  3],
       [-1,  5, -1,  7],
       [ 8,  9, 10, 11],
       [-1, 13, -1, 15],
       [16, 17, 18, 19]])
Praveen
  • 6,872
  • 3
  • 43
  • 62
  • 4
    In a few tests, I also found `np.ix_` to be faster than the method of selecting first columns and then rows (usually about 2x as fast on my tests of square arrays of sizes 1K-10K where you reindex all rows and columns). – Nathan Mar 08 '19 at 21:56
99

Fancy indexing requires you to provide all indices for each dimension. You are providing 3 indices for the first one, and only 2 for the second one, hence the error. You want to do something like this:

>>> a[[[0, 0], [1, 1], [3, 3]], [[0,2], [0,2], [0, 2]]]
array([[ 0,  2],
       [ 4,  6],
       [12, 14]])

That is of course a pain to write, so you can let broadcasting help you:

>>> a[[[0], [1], [3]], [0, 2]]
array([[ 0,  2],
       [ 4,  6],
       [12, 14]])

This is much simpler to do if you index with arrays, not lists:

>>> row_idx = np.array([0, 1, 3])
>>> col_idx = np.array([0, 2])
>>> a[row_idx[:, None], col_idx]
array([[ 0,  2],
       [ 4,  6],
       [12, 14]])
Jaime
  • 65,696
  • 17
  • 124
  • 159
  • 7
    Thanks, I did not know you could do this! Broadcasting is weird and wonderful... After two years of numpy, I'm still getting used to it. – Praveen Apr 08 '14 at 08:15
  • 2
    Thanks! While the other answers did answer my question correctly in terms of returning the selected matrix, this answer addressed that while also addressing the issue of assignment (how to set a[[0,1,3], [0,2]] = 0, for example). – Mike C Apr 08 '14 at 16:34
  • 1
    @Jaime - Just yesterday I discovered a one-liner built-in to do exactly the broadcasting trick you suggest: [np.ix_](http://stackoverflow.com/a/22931212/525169) – Praveen Jan 09 '16 at 10:29
  • 2
    Could someone provide an explanation as to why the syntax works like this? What is the reason it works for both first examples but not the third. And also, how does encapsulating the wanted indices in their own lists solve this? Thank you – Imad Jan 16 '18 at 20:38
  • 2
    Why do the rows need to be nested and the cols are not? – AturSams Mar 26 '19 at 09:40
  • 1
    this allways takes me hours and back to `stackoverflow` to remember – imbr Dec 13 '19 at 18:40
9

USE:

 >>> a[[0,1,3]][:,[0,2]]
array([[ 0,  2],
   [ 4,  6],
   [12, 14]])

OR:

>>> a[[0,1,3],::2]
array([[ 0,  2],
   [ 4,  6],
   [12, 14]])
Toan Nguyen
  • 631
  • 4
  • 10
  • 11
    While this is correct, you should consider posting a bit of further information explaining *why* it is correct. – ebarr Apr 08 '14 at 07:44
7

Using np.ix_ is the most convenient way to do it (as answered by others), but it also can be done as follows:

>>> rows = [0, 1, 3]
>>> cols = [0, 2]

>>> (a[rows].T)[cols].T

array([[ 0,  2],
       [ 4,  6],
       [12, 14]])
Andreas K.
  • 9,282
  • 3
  • 40
  • 45