1
from sklearn.preprocessing import StandardScaler
X = df.values[:,1:] 
X = np.nan_to_num(X)
Clus_dataSet = StandardScaler().fit_transform(X)
Clus_dataSet

Does anyone understand what is the meaning of this context?

Here is the screenshot!!

Hui Yang ONG
  • 91
  • 2
  • 2
  • 4
  • `df` is a pandas dataframe. Please read the [pandas tutorials](https://pandas.pydata.org/docs/getting_started/intro_tutorials/index.html) and [Getting Started](https://pandas.pydata.org/docs/user_guide/index.html). It's searchable (box in top left) and answers these questions. – smci Nov 07 '20 at 01:59
  • And please see numpy doc for functions like [`np.nan_to_num()`](https://numpy.org/doc/stable/reference/generated/numpy.nan_to_num.html) – smci Nov 07 '20 at 02:02

4 Answers4

4
  • df is a DataFrame with several columns and apparently the target values are on the first column.

  • df.values returns a numpy array with the underlying data of the DataFrame, without any index or columns names.

  • [:, 1:] is a slice of that array, that returns all rows and every column starting from the second column. (the first column is index 0)

RichieV
  • 5,103
  • 2
  • 11
  • 24
2

As Richie said with X = df.values[:,1:] you basically make X equal to your dataframe but it skips the first column.

X = np.nan_to_num(X) substitutes any NaN values with numerical values.

Clus_dataSet = StandardScaler().fit_transform(X) normalizes the data

Clus_dataSet returns us the dataset.

Be careful because later when you will be plotting your data if you use the X variable you will have to index the data from the second column. X[0] = df[1]

For example: plt.scatter(X[:, 0], X[:, 3], s=area, c=labels.astype(np.float), alpha=0.5)

the X[:, 0] contains the first column of the new variable which previously was df[:, 1] if that makes sense. Kinda hard explaining it.

jaabh
  • 815
  • 6
  • 22
0

df.values is gives us dataframe values as numpy array object. df.values[:, 1:] is a way of accessing required values with indexing It means all the rows and all columns except 0th index column in dataframe.

Kuldip Chaudhari
  • 1,112
  • 4
  • 8
0

Df here refers to the data frame you are analysing.

In the second line of your code df.Values is used to just return the values and not the indexes of the data frame. Inside the bracket the arguments means that you are loading all the rows of the data frame and ignoring the column at index position 1(which probably is the dependent variable, I assume).