I was trying auto feature engineering and selecting, so for that, I used the Boston house price dataset available in sklearn
.
from sklearn.datasets import load_boston
import pandas as pd
data = load_boston()
x = data.data
y= data.target
y = pd.DataFrame(y)
Then I implemented the feature transformation library on the dataset.
import autofeat as af
clf = af.AutoFeatRegressor()
df = clf.fit_transform(x,y)
df = pd.DataFrame(df)
After this, I implemented another function to find the score of each feature in relation to the label.
from sklearn.feature_selection import SelectKBest, chi2
X_new = SelectKBest(chi2, k=20)
X_new_done = X_new.fit_transform(df,y)
dfscores = pd.DataFrame(X_new.scores_)
dfcolumns = pd.DataFrame(X_new_done.columns)
featureScores = pd.concat([dfcolumns,dfscores],axis=1)
featureScores.columns = ['Specs','Score']
print(featureScores.nlargest(10,'Score'))
This gave error as following.
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-16-b0fa1556bdef> in <module>()
1 from sklearn.feature_selection import SelectKBest, chi2
2 X_new = SelectKBest(chi2, k=20)
----> 3 X_new_done = X_new.fit_transform(df,y)
4 dfscores = pd.DataFrame(X_new.scores_)
5 dfcolumns = pd.DataFrame(X_new_done.columns)
ValueError: Input X must be non-negative.
I had a few negative numbers in my dataset. So how can I overcome this problem?
Note:- df
has now transformations of y
, its only having transformations of x
.