1

Here is the code and related document (http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html#sklearn.datasets.load_iris), I am confused by this line, data.target[[10, 25, 50]], confused why using double [[]], if anyone could clarify, it will be great.

from sklearn.datasets import load_iris
data = load_iris()
print data.target[[10, 25, 50]]
print list(data.target_names)

thanks in advance, Lin

Lin Ma
  • 9,739
  • 32
  • 105
  • 175

2 Answers2

1

Your confusion is understandable: this isn't "standard" Python by any means.

data.target in this case is an ndarray from numpy:

In [1]: from sklearn.datasets import load_iris
   ...: data = load_iris()
   ...: print data.target[[10, 25, 50]]
   ...: print list(data.target_names)
[0 0 1]
['setosa', 'versicolor', 'virginica']

In [2]: print type(data.target)
<type 'numpy.ndarray'>

numpy's ndarray implementation allows you to create a new array by providing a list of indices of the items you want. For example:

In [13]: data.target
Out[13]:
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

In [14]: data.target[1]
Out[14]: 0

In [15]: data.target[[1,2,3]]
Out[15]: array([0, 0, 0])

In [16]: print type(data.target[[1,2,3]])
<type 'numpy.ndarray'>

and it likely does this by overriding __getitem__.

For more information, see Indexing in the NumPy array documentation:

Community
  • 1
  • 1
Christian Ternus
  • 8,406
  • 24
  • 39
  • Thanks Christian, vote up for the comprehensive reply, so it means retrieve `10, 25, 50`-th element from `numpy` array `data.target` and for a new list in Python? If so `type(data.target[[10, 25, 50]])` should be a formal Python list, correct? – Lin Ma Aug 21 '16 at 00:11
  • 2
    @LinMa Look at the last line of code in the answer, it is still a `numpy.ndarray` – OneCricketeer Aug 21 '16 at 00:13
  • Thanks @cricket_007, vote up. It is all clear now. :) – Lin Ma Aug 21 '16 at 00:14
  • 1
    You can, in many ways, treat a `numpy.ndarray` as a list (iterating over it, slicing into it, etc.) but if you're going to do so I highly recommend reading [the documentation](http://docs.scipy.org/doc/numpy/reference/arrays.html) first, as there are some nonstandard behaviors that may trip you up. – Christian Ternus Aug 21 '16 at 00:14
  • Thanks Christian, vote up and mark your reply as answer. – Lin Ma Aug 21 '16 at 00:15
1

This is retrieving elements from a numpy array A using "integer indexing" syntax (as opposed to the usual subscripts), i.e. a list of integers B will be used to find elements at those particular indices in A. Your output is a numpy array with the same shape as the list B that you use as "input", and the values of the output elements are obtained from the values of A at those integer indices e.g.:

>>> import numpy
>>> a = numpy.array([0,1,4,9,16,25,36,49,64,81])
>>> a[[1,4,4,1,5,6,6,5]]
  array([ 1, 16, 16,  1, 25, 36, 36, 25])

Integer indexing can be applied to more than one dimensions, e.g.:

>>> b = numpy.array([[0,1,4,9,16],[25,36,49,64,81]]) # 2D array
>>> b[[0,1,0,1,1,0],[0,1,4,3,2,3]]   # row and column integer indices
  array([ 0, 36, 16, 64, 49,  9])

or, the same example but with an input list of 2 dimensions, affecting the output shape:

>>> b[[[0,1,0],[1,1,0]],[[0,1,4],[3,2,3]]] # "row" and "column" 2D integer arrays
  array([[ 0, 36, 16],
         [64, 49,  9]])

Also note that you can perform "integer indexing" using a numpy array as well, rather than a list, e.g.

>>> a[numpy.array([0,3,2,4,1])]
  array([ 0,  9,  4, 16,  1])
Tasos Papastylianou
  • 21,371
  • 2
  • 28
  • 57