3

Consider the following:

import numpy as np

X = np.ones((5,5))

print(X[:,0].shape)
print(X[:,0:1].shape)
  • X[:,0].shape returns (5,)

  • X[:,0:1].shape returns (5,1).

In both cases the same column is selected (indexed) but why is this happening? What is the logic behind it?


Exactly the same happens with X[:,-1:].shape and X[:,-1].shape

seralouk
  • 30,938
  • 9
  • 118
  • 133

2 Answers2

1

This behaviour is explained by the fact that, as opposed to indexing with a slice, integer indexing with say i, will return the same values as a slice i:i+1 but with the dimensionality of the returned object reduced by 1. This is explained in the docs:

In particular, a selection tuple with the p-th element an integer (and all other entries :) returns the corresponding sub-array with dimension N - 1


We could write a simple subclass to take a closer look at how np.ndarray handles indexing, and see what the __getitem__ dunder is receiving in each call:

class ndarray_getitem_print(np.ndarray):
    def __getitem__(self, t):
        print(t)
        return super().__getitem__(t)

Now let's instanciate ndarray_getitem_print and see what are the differences when indexing with a slice and an integer:

a = ndarray_getitem_print((5,5))

a[:,0:1]

(slice(None, None, None), slice(0, 1, None))
(-5, -1)
(-4, -1)
(-3, -1)
(-2, -1)
(-1, -1)
ndarray_getitem_print([[1.],
                       [1.],
                       [1.],
                       [1.],
                       [1.]])

Whereas indexing along the second axis with a 0, will be producing an output ndarray where each item has a one dimensional shape, i.e (-k,)

a[:,0]

(slice(None, None, None), 0)
(-5,)
(-4,)
(-3,)
(-2,)
(-1,)
ndarray_getitem_print([1., 1., 1., 1., 1.])
yatu
  • 86,083
  • 12
  • 84
  • 139
0

I think its because of the indexing? If you print the results of both functions they are different. In a NxM np.array, its arrays in arrays. When you index with X[:,0] the structure of array in array is lost. Therefore shape only sees what you got which is a single array rather than a multidimensional array. AFAIK indexing without a range takes the value from a list/set/something. Whereas you slice when you address by range.

You can see this when you access the types

>>> b = X[:,0]
>>> c = X[:,0:1]
>>> type(b)
<class 'numpy.ndarray'>
>>> type(c)
<class 'numpy.ndarray'>
>>> type(c[0])
<class 'numpy.ndarray'>
>>> type(b[0])
<class 'numpy.float64'>

Just to show the difference between the two.

>>> c[0]
array([1.])
>>> b[0]
1.0
Jason Chia
  • 1,144
  • 1
  • 5
  • 18