-2

I just started to studying machine learning and I saw a code. I don't know anything about it. Also I don't know how to search it... I am stuck here please help. Here is the example code:

from sklearn import datasets, model_selection
import matplotlib.pyplot as plt
import numpy

X, y = datasets.load_diabetes(return_X_y=True)
X = X[:, numpy.newaxis, 2] # I didn't understand this part
X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=0.33)
plt.scatter(X_test, y_test,  color='black')
plt.show()

Where does 2 come from? What is np.newaxis (I think this is a method which returns None but I am not sure) Also what are these parameters separated with commas inside square brackets? Please tell me the name of it or explain what it is. Thank you :)

iltan
  • 19
  • 5
  • 1
    See [https://stackoverflow.com/questions/29241056/how-does-numpy-newaxis-work-and-when-to-use-it](https://stackoverflow.com/questions/29241056/how-does-numpy-newaxis-work-and-when-to-use-it) for an explanation of the newaxis method, beyond that what have you tried to research the line in question? Have you checked/printed/examined `X` and `X[:, numpy.newaxis, 2]` to see what they look like and contain? – G. Anderson Jul 25 '22 at 22:07
  • Thank you for response @G.Anderson I printed and checked to see what they look like. I saw something like it takes only 2nd indices of every array inside of X but how? What does it called? – iltan Jul 25 '22 at 22:12
  • @G.Anderson I learned that `numpy.newaxis` used to increase the dimension. What else can I use instead of it? Like what kind of data type? – iltan Jul 25 '22 at 22:23
  • Please always give a **descriptive title** to your questions. – desertnaut Jul 26 '22 at 08:43

1 Answers1

1

This is called indexing, or sometimes slicing, you can read more about it on numpy's user guide.

The 2 is an arbitrary index chosen by someone who wrote the code, which basically slices all the 3rd element of each row (the bmi feature according to sklearn's diabetes dataset documentation)

np.newaxis is a constant from numpy used to increase the dimension of an ndarray. (read more here as mentioned in the comment)

Therefore, the code tries to select only one feature for the training data from the 10 available features from the dataset before splitting the constructed dataset to train and test data.

desertnaut
  • 57,590
  • 26
  • 140
  • 166