I'm having a terrible time resolving the warning problem described in this question unfortunately, following the suggested fixes here I'm not getting my problem solved.
Apparently I'm feeding a 1D array into SVM.SVC predict and I'm getting deprecation warnings. I just can't figure out what I'm doing wrong and I'm hoping someone can help me fix my code. I'm sure it is a small correction I'm missing.
I'm using Python 2.7
I start with a dataframe data_df (dimensions reduced here for clarity but code and structure are accurate):
Price/Sales Price/Book Profit Margin Operating Margin
0 2.80 6.01 29.56 11.97
1 2.43 4.98 25.56 6.20
2 1.61 3.24 4.86 5.38
3 1.52 3.04 4.86 5.38
4 3.31 4.26 6.38 3.58
I change the dataframe to a numpy array:
X = data_df.values
which gives me:
[[ 2.8, 6.01, 29.56, 11.97],
[ 2.43, 4.98, 25.56, 6.2 ],
[ 1.61, 3.24, 4.86, 5.38],
[ 1.52, 3.04, 4.86, 5.38],
[ 3.31, 4.26, 6.38, 3.58]]
Then I center and normalize my data:
X = preprocessing.scale(X)
which give me:
[[ 0.67746872 1.5428404 1.39746257 1.90843628]
[ 0.13956437 0.61025495 1.03249454 -0.10540376]
[-1.05254797 -0.96518067 -0.85621499 -0.3915994 ]
[-1.18338957 -1.14626523 -0.85621499 -0.3915994 ]
[ 1.41890444 -0.04164945 -0.71752714 -1.01983373]]
My y is a series of 0's and 1's:
[0, 0, 1, 0, 1]
The actual data set is about 10,000 observations. I use the following code to select subsets for training, testing, and checking accuracy:
test_size = 500
clf = svm.SVC(kernel = "linear", C=1.0)
clf.fit(X[:-test_size],y[:-test_size])
correct_count = 0
for x in range(1, test_size+1):
if clf.predict(X[-x])[0] == y[-x]:
correct_count += 1
print("Accuracy: ", correct_count / test_size * 100.00)
The test set of factors I feed into clf.predict
(X[-x] for x = 1 to test_size +1) throws the following warning:
C:\Users\me\AppData\Local\Continuum\Anaconda2\lib\site-packages\sklearn\ut
ils\validation.py:386: DeprecationWarning: Passing 1d arrays as data is deprecat
ed in 0.17 and willraise ValueError in 0.19. Reshape your data either using X.re
shape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contain
s a single sample.
DeprecationWarning)
The code works and I do get predictions and am able to calculate accuracy but I'm still throwing the warning.
As far as I can tell from searching and the above referenced other question my data IS in the proper form. What am I missing?
Thanks in advance for your help.