0

I have used one hot encoding to my dataset before training my SVM classifier. which increased number of features in training set to 982. But during prediction of test dataset which has 7 features i am getting error " X has 7 features per sample; expecting 982". I don't understand how to increase number of features in test dataset.

My code is:
df = pd.read_csv('train.csv',header=None);
features = df.iloc[:,:-1].values
labels = df.iloc[:,-1].values
encode = LabelEncoder()
features[:,2] = encode.fit_transform(features[:,2])
features[:,3] = encode.fit_transform(features[:,3])
features[:,4] = encode.fit_transform(features[:,4])
features[:,5] = encode.fit_transform(features[:,5])

df1 = pd.DataFrame(features)
#--------------------------- ONE HOT ENCODING --------------------------------#

hotencode = OneHotEncoder(categorical_features=[2])
features = hotencode.fit_transform(features).toarray()
hotencode = OneHotEncoder(categorical_features=[14])
features = hotencode.fit_transform(features).toarray()
hotencode = OneHotEncoder(categorical_features=[37])
features = hotencode.fit_transform(features).toarray()
hotencode = OneHotEncoder(categorical_features=[466])
features = hotencode.fit_transform(features).toarray()
X = np.array(features)
y = np.array(labels)

clf = svm.LinearSVC()
clf.fit(X,y)
d_test = pd.read_csv('query.csv')
Z_test =np.array(d_test)
confidence = clf.predict(Z_test)
print("The query image belongs to Class ")
print(confidence)

######################### test dataset
query.csv
1   0.076   1   3232236298  2886732679  3128    60604
Saveen
  • 4,120
  • 14
  • 38
  • 41
arindom
  • 3
  • 3

1 Answers1

0

The short answer: you need to apply the same OHE transform (or LE+OHE in your case) on the test set.

For a good advice, see Scikit Learn OneHotEncoder fit and transform Error: ValueError: X has different shape than during fitting or How to deal with imputation and hot one encoding in pandas?

Mischa Lisovyi
  • 3,207
  • 18
  • 29