Indexing on ndarrays result in wrong shape

Question

The following snippet

x = np.ones((10,10,10))
x = x[2,:,[2,3,7]]
print(x.shape)

results in x.shape = (3,10) instead of (10,3). How do I use a list to index the 3rd dimension to get a shape (10,3)?

You will need to transpose manually or reduce the axes one at a time: `x[2][:, [2, 3, 7]]` — Chrysophylaxs, Mar 09 '23 at 19:14
thanks. Thats weird. Was sure it had the same syntax as PyTorch. — Kong, Mar 09 '23 at 19:18
No problem. It's due to the rules for advanced indexing... here is some info in case you're interested. https://numpy.org/doc/stable/user/basics.indexing.html#combining-advanced-and-basic-indexing — Chrysophylaxs, Mar 09 '23 at 19:21
I just realized that there is a way to do it in one indexing call, but you'll have to forego the slice and create indices that broadcast to `(10, 3)`: `x[2, np.arange(10)[:, None], [2, 3, 7]]`. It might be more trouble than it's worth though... ;) — Chrysophylaxs, Mar 09 '23 at 19:44

Pranav Hosangadi · Accepted Answer · 2023-03-09T22:10:09.543

NB: I changed your array to x = np.arange(125).reshape((5, 5, 5)) and adjusted the indexes/slices to y = x[2, :, [0, 2, 4]] to make it easier to see is being selected.

TL;DR You can either transpose your result, or represent the index on the first axis as a slice, and squeeze that result:

>>> x[2, :, [0, 2, 4]].T
array([[50, 52, 54],
       [55, 57, 59],
       [60, 62, 64],
       [65, 67, 69],
       [70, 72, 74]])

>>> x[2:3, :, [0, 2, 4]].squeeze()
array([[50, 52, 54],
       [55, 57, 59],
       [60, 62, 64],
       [65, 67, 69],
       [70, 72, 74]])

I think a more interesting question is why this happens. For the longest time, I internalized this as "numpy does weird stuff sometimes, just memorize it", but it does have an explanation that you can apply to any general case. From the link @Chrysophylaxs shared, combined advanced and basic indexing is handled by first slicing, then advanced indexing. If you do that here, you get:

>>> x[2, :]
array([[50, 51, 52, 53, 54],
       [55, 56, 57, 58, 59],
       [60, 61, 62, 63, 64],
       [65, 66, 67, 68, 69],
       [70, 71, 72, 73, 74]])

>>> x[2, :][:, [0, 2, 4]]
array([[50, 52, 54],
       [55, 57, 59],
       [60, 62, 64],
       [65, 67, 69],
       [70, 72, 74]])

Which is of the shape (5, 3), as we expected.

However, this is not what numpy is doing. As mentioned in that link, when a single index on one dimension is requested with advanced indexing on another, the single index is treated as advanced indexing instead of a slice.

Two cases of index combination need to be distinguished:

The advanced indices are separated by a slice, Ellipsis or newaxis. For example x[arr1, :, arr2].

The advanced indices are all next to each other. For example x[..., arr1, arr2, :] but not x[arr1, :, 1] since 1 is an advanced index in this regard.

When this is the case,

the dimensions resulting from the advanced indexing operation come first in the result array, and the subspace dimensions after that.

The dimensions resulting from the advanced indexing operation would be ~~(1, 3)~~ (3,) (The scalar index 2 contributes no dimensions, and the list index [0, 2, 4] contributes one dimension of size 3)^{Thanks @Mad Physicist and @Chrysophylaxs for the clarification}. This is why we get a (3, 5) array (you get a (3, 10) array)

>>> x[2, :, [0, 2, 4]]
array([[50, 55, 60, 65, 70],
       [52, 57, 62, 67, 72],
       [54, 59, 64, 69, 74]])

If you used a similar advanced indexing operation with two indices on the first dimension, you'd get a result with a shape of (2, 3, 5):

>>> x[[[2], [3]], :, [0, 2, 4]]
array([[[50, 55, 60, 65, 70],
        [52, 57, 62, 67, 72],
        [54, 59, 64, 69, 74]],

       [[75, 80, 85, 90, 95],
        [77, 82, 87, 92, 97],
        [79, 84, 89, 94, 99]]])

>>> x[[[2], [3]], :, [0, 2, 4]].shape
(2, 3, 5)

To get a (5, 3) array (or a (10, 3) array in your case), change the first index to a slice, and then squeeze the result. This allows numpy to switch back to the first case, where the slice is done before the advanced indexing.

>>> x[2:3, :, [0, 2, 4]]      # (1, 5, 3)
array([[[50, 52, 54],
        [55, 57, 59],
        [60, 62, 64],
        [65, 67, 69],
        [70, 72, 74]]])

>>> x[2:3, :, [0, 2, 4]].squeeze()
array([[50, 52, 54],
       [55, 57, 59],
       [60, 62, 64],
       [65, 67, 69],
       [70, 72, 74]])

Alternatively, just transpose (or move/swap axis of) the result you get from your regular indexing:

>>> x[2, :, [0, 2, 4]].T
array([[50, 52, 54],
       [55, 57, 59],
       [60, 62, 64],
       [65, 67, 69],
       [70, 72, 74]])

Excellent answer! Though the slice trick gets close, it does result in an extra size 1 dimension, so the shape would be `(1, 5, 3)` :( — Chrysophylaxs, Mar 09 '23 at 19:50
@Chrysophylaxs Thanks and you're absolutely right! I forgot to include that we need to squeeze the first dimension. Thanks for catching that! — Pranav Hosangadi, Mar 09 '23 at 19:53
I added at answer looking at the strides for various alternatives. — hpaulj, Mar 09 '23 at 21:21
The advanced indexing dimension from a scalar is not `1`, but rather `()`. It's not squeezed out: it's never there to begin with. — Mad Physicist, Mar 09 '23 at 21:32
I don't agree with the new edit: in `x[2, :, [0, 2, 4]]`, the indices broadcast to a `(3,)` space. No `(0, 3)` or squeezing going on here... unless I'm misunderstanding. — Chrysophylaxs, Mar 09 '23 at 22:04
@Chrysophylaxs I wanted to say that the scalar contributes no dimensions. I rephrased that part to say it more directly — Pranav Hosangadi, Mar 09 '23 at 22:10
I just realized I was late to the party... my apologies. Indices are always broadcast, which is the logic that determines the size of the space that gets inserted. That might be noteworthy — Chrysophylaxs, Mar 09 '23 at 22:13

hpaulj · Answer 2 · 2023-03-09T21:35:34.453

I think the strides and base give some added insight, if not an actual explanation.

Make a 3d array:

In [77]: x = np.arange(24).reshape(2,3,4); x.shape, x.strides
Out[77]: ((2, 3, 4), (48, 16, 4))

The slice-in-the-middle case that gives 'weird' shape:

In [78]: y = x[1,:,[1,2]]; y.shape, y.strides
Out[78]: ((2, 3), (12, 4))

In [79]: y
Out[79]: 
array([[13, 17, 21],
       [14, 18, 22]])

That y is a copy (own base), and the strides are normal for that shape.

Now try the two-step indexing that gives the expected shape:

In [80]: z = x[1][:,[1,2]]; z.shape, z.strides
Out[80]: ((3, 2), (4, 12))

The strides is "reversed", what we get from a transpose.

In [81]: z
Out[81]: 
array([[13, 14],
       [17, 18],
       [21, 22]])

And in fact its base is the same y array.

In [82]: z.base
Out[82]: 
array([[13, 17, 21],
       [14, 18, 22]])

Advanced index of columns (for 2d array) produces this transpose view.

The y case looks a lot like it the the [:,[1,2]] indexing, but couldn't (for one reason or other) perform the transpose that we see in z.

With the fully broadcasted version:

In [87]: w = x[1,np.arange(3)[:,None],[1,2]]; w.shape, w.strides
Out[87]: ((3, 2), (8, 4))

In [88]: w
Out[88]: 
array([[13, 14],
       [17, 18],
       [21, 22]])

This has the shape and values as z, but the strides are different.

And if we replace the first scalar index with a slice:

In [103]: u = x[1:2,:,[1,2]][0]; u.shape, u.strides
Out[103]: ((3, 2), (4, 12))

In [104]: u.base
Out[104]: 
array([[[13, 17, 21]],

       [[14, 18, 22]]])

In [105]: u.base.shape
Out[105]: (2, 1, 3)

Note that the third dimension [1,2] index is first, just as in the 'slice-in-the-middle' case; it transposes the (2,1,3) base to a (1,3,2), which I reduced to (3,2).

As I understand it, advanced indexing always produces a copy. However, after `z = x[1][:,[1,2]]` it seems we have that `z` is a (transposed) view of another array. Why is that? Where is this base array coming from? — Chrysophylaxs, Mar 09 '23 at 21:58
@Chrysophylaxs, apparently the `[:,[1,2]` indexing creates an array with the `[1,2]` dimension first, with the slice tacked on, just as in the [1,:,[1,2]]` case. That's the `base`. It is then transposed to put slice dimension first. Advanced indexing, even when the result has the correct shape, is more complex than we realize. — hpaulj, Mar 09 '23 at 22:16
Aha, so it seems like there is some transposing going on regardless of whether we're dealing with a "slice-in-the-middle" or not... thanks :) — Chrysophylaxs, Mar 09 '23 at 22:19
In fact, scratch that. It seems like the new dimension(s) introduced by the advanced indices being inserted at the front is the "default" behavior, with them only afterwards being transposed in position in case there is no ambiguity (no slice-in-the-middle). The docs/guide had always given me the impression that it was the other way around. — Chrysophylaxs, Mar 09 '23 at 22:39

Indexing on ndarrays result in wrong shape

2 Answers2

Linked