0

I am trying to imitate this one code that i found on Kaggle on plotting SVM decision boundaries. I am using my own dataset with 608 data and 10 features, with 2 classes. Those 2 classes, for instance, is whether you're diabetec or not. I copied the SVM part of the code on this link (in which you can find when you scroll it way down at the bottom) where it mentioned about decision boundary visualisation. Here's the link to my reference.

However, i get this error saying that "X must be a Numpy array". Can someone explain to me what does this mean?

The code below is what i've done. Take note that my dataset have been normalised beforehand. Also, I'm splitting the data into 70:30 ratio.

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import matplotlib.pyplot as show
import matplotlib as cm
import matplotlib.colors as colors
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn import svm
from mlxtend.plotting import plot_decision_regions


autism = pd.read_csv('diabetec.csv')

x = autism.drop(['TARGET'], axis = 1)  
y = autism['TARGET']
x_train, X_test, y_train, y_test = train_test_split(x, y, test_size = 0.30, random_state=1)

t = np.array(y_train)
t = t.astype(np.integer)
clf_svm = SVC(C=1.3, gamma=0.8, kernel='rbf')
clf_svm.fit(x_train, t)
plt.figure(figsize=[15,10])
plot_decision_regions(x_train, t, clf = clf_svm, hide_spines = False, colors = 'purple,limegreen', markers = ['x','o'])
plt.title('Support Vector Machine')

Jorge Heigl
  • 176
  • 5
Falady
  • 124
  • 1
  • 15
  • Can you provide some extra information: in which function does the error occur, some sample of your data (a few lines to get the format right). – Eypros Apr 17 '19 at 07:15
  • it says the error is in line 16 where i declared the 'x' variable. you can have a look at my full codes and file here https://github.com/falady/svm_classification.git – Falady Apr 17 '19 at 07:18
  • Yes, you need to set x and y as np arrays and everything should work just fine. – DaveIdito Apr 17 '19 at 07:18
  • @DaveIdito does that mean i need to set, for example, y as y = np.autism['TARGET']? – Falady Apr 17 '19 at 07:22
  • I think setting the two lines as x = np.array(autism.drop(['TARGET'], axis=1)) and same with y = np.array(autism['TARGET']) will fix the issue. Also, which line are you getting the error at? – DaveIdito Apr 17 '19 at 07:25
  • I got this new error "Filler values must be provided when X has more than 2 training features" when i changed the code to what you've suggested. It says it's in line 25, at the plot_decision_regions... – Falady Apr 17 '19 at 07:30

1 Answers1

0

plot_decision_regions expects a numpy array but x_train is a pandas dataframe . Try with x_train.values, i.e.

plot_decision_regions(x_train.values, t, clf = clf_svm, ...
JARS
  • 1,109
  • 7
  • 10
  • it says 'numpy.ndarray' object has no attribute 'values' – Falady Apr 17 '19 at 07:31
  • sorry, just x_train.values, and keep `t`. I see in your code that t is already a numpy array. – JARS Apr 17 '19 at 07:32
  • I think the problem was only on the x declaration as I have declared the 'y' using numpy.array. And nope, I still get the same error as the one i've commented. – Falady Apr 17 '19 at 07:35
  • I noticed that when i tried to add .values to the x_train, this came up 'filler_feature_values='. Is there any values that i should declare first? and what does filler_feature_values do? – Falady Apr 17 '19 at 07:38