I have a large data set approximately (35000 x 27). I am running sklearn SVM in linear and polynomial regressions. My run times are sometimes 30 mins or more. Is there a more efficient way to run my SVM?
I have tried removing unnecessary displays of data, and trying different mixes of test and train but it is always close to being the same duration. Running gaussian or "RBF" runs in about 6 minutes however but with much lower accuracy.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn import svm
from sklearn import metrics
proteindata = pd.read_csv("data.csv")
np.any(np.isnan(proteindata))
print(proteindata.shape)
print(proteindata.columns)
print(proteindata.head())
X = proteindata.drop("Class", axis=1)
y = proteindata["Class"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.40)
Classifier = svm.SVC(kernel='poly')
Classifier.fit(X_train, y_train)
y_pred = Classifier.predict(X_test)
print("Accuracy:", metrics.accuracy_score(y_test, y_pred))
I am not getting any errors besides being told to set gamma manually.