sklearn: sklearn.preprocessing DeprecationWarning for arrays

Question

First I looked at all the related question. There are very similar problems given.
So I followed suggestions from the links, but none of them worked for me.
Data Conversion Error while applying a function to each row in pandas Python
Getting deprecation warning in Sklearn over 1d array, despite not having a 1D array

I also tried to follow the error message, it also didn't work.

The code looks like this:

# Importing the libraries
import numpy as np
import pandas as pd

# Importing the dataset
dataset = pd.read_csv('Position_Salaries.csv')
X = dataset.iloc[:, 1:2].values
y = dataset.iloc[:, 2].values

# avoid DataConversionError
X = X.astype(float)
y = y.astype(float)


## Attempt to avoid DeprecationWarning for sklearn.preprocessing
#X = X.reshape(-1,1)                  # attempt 1
#X = np.array(X).reshape((len(X), 1)) # attempt 2
#X = np.array([X])                    # attempt 3


# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
sc_y = StandardScaler()
X = sc_X.fit_transform(X)
y = sc_y.fit_transform(y)

# Fitting SVR to the dataset
from sklearn.svm import SVR
regressor = SVR(kernel = 'rbf')
regressor.fit(X, y)

# Predicting a new result
y_pred = regressor.predict(sc_X.transform(np.array([6.5])))
y_pred = sc_y.inverse_transform(y_pred)

The data looks like this:

Position,Level,Salary
Business Analyst,1,45000
Junior Consultant,2,50000
Senior Consultant,3,60000
Manager,4,80000
Country Manager,5,110000
Region Manager,6,150000
Partner,7,200000
Senior Partner,8,300000
C-level,9,500000
CEO,10,1000000

The full error log goes like this:

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/preprocessing/data.py:586: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/preprocessing/data.py:649: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/preprocessing/data.py:649: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/utils/validation.py:395: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  DeprecationWarning)

I am using only second and third column so there is no need for one hot encoding for the first column. The only problem is DeprecationWarning.

I tried all the suggestions given but none of them worked.
So, the help will be truly appreciated.

So let me get this straight: you have single feature in your X (the level column from dataframe)? — Vivek Kumar, May 12 '17 at 05:14
@VivekKumar I have posted the data in the question, take a look. — BhishanPoudel, May 14 '17 at 02:15

unsupervised_learner · Accepted Answer · 2017-05-12T00:41:55.150

This was a strange one. The code I used to get rid of the deprecation warnings is below, with a slight modification to how you fit StandardScaler() and called transform(). The solution involved painstakingly reshaping and raveling the arrays according to the warning messages. Not sure if this is the best way, but it removed the warnings.

# Importing the libraries
import numpy as np
import pandas as pd
from io import StringIO
from sklearn.preprocessing import StandardScaler

# Setting up data string to be read in as a .csv
data = StringIO("""Position,Level,Salary
Business Analyst,1,45000
Junior Consultant,2,50000
Senior Consultant,3,60000
Manager,4,80000
Country Manager,5,110000
Region Manager,6,150000
Partner,7,200000
Senior Partner,8,300000
C-level,9,500000
CEO,10,1000000""")

dataset = pd.read_csv(data)

# Importing the dataset
#dataset = pd.read_csv('Position_Salaries.csv')

# Deprecation warnings call for reshaping of single feature arrays with reshape(-1,1)
X = dataset.iloc[:, 1:2].values.reshape(-1,1)
y = dataset.iloc[:, 2].values.reshape(-1,1)

# avoid DataConversionError
X = X.astype(float)
y = y.astype(float)

#sc_X = StandardScaler()
#sc_y = StandardScaler()
X_scaler = StandardScaler().fit(X)
y_scaler = StandardScaler().fit(y)

X_scaled = X_scaler.transform(X)
y_scaled = y_scaler.transform(y)

# Fitting SVR to the dataset
from sklearn.svm import SVR
regressor = SVR(kernel = 'rbf')

# One of the warnings called for ravel()
regressor.fit(X_scaled, y_scaled.ravel())

# Predicting a new result
# The warnings called for single samples to reshaped with reshape(1,-1)
X_new = np.array([6.5]).reshape(1,-1)
X_new_scaled = X_scaler.transform(X_new)
y_pred = regressor.predict(X_new_scaled)
y_pred = y_scaler.inverse_transform(y_pred)

It is the best way. To know why the warnings exist, you can look at [this question](https://stackoverflow.com/questions/41972375/why-does-scikit-learn-demand-different-data-shapes-for-different-regressors/42063867#42063867) — Vivek Kumar, May 12 '17 at 05:24
Thank you, I quite appreciate the reference to the philosophy behind requiring these manipulations. — unsupervised_learner, May 12 '17 at 05:46
The main part to notice there is that for a 1-d array, earlier scikit-learn used to infer automatically based on both X and y. But in cases, where only X is supplied (say StandardScaler), how its problematic to infer, that the supplied array is only one sample with n features, or n samples having single feature each. It still does process it but warns now to explicitly convert before supplying because next versions will not automatically infer that and will give error. — Vivek Kumar, May 12 '17 at 06:42

sklearn: sklearn.preprocessing DeprecationWarning for arrays

1 Answers1