For me, there might be some differences in the implementations of Ridge()
and SVR()
as you are pointing out.
On one side, there's a difference in the loss function as you might see here (epsilon-insensitive loss and squared epsilon-insensitive loss) vs here (Ridge loss). This is emphasized also within this example from sklearn documentation which however compares Kernel Ridge Regression and SVR with a non-linear kernel.
In addition to this, the fact you're using SVR with a polynomial Kernel of degree 1 adds a further difference: as you can see here and here (SVR is built on top of the LibSVM library) there's a further parameter (gamma
) to be considered (you might put it equal to 1 for convenience, it equals 'scale'
by default).
Here is the difference in fitting that I could get by adjusting this toy example (with non-tuned parameters). I've also tried to consider LinearSVR()
that has some further differences wrt SVR()
as you can see eg here or here.
print(__doc__)
import numpy as np
from sklearn.linear_model import Ridge
from sklearn.svm import LinearSVR, SVR
import matplotlib.pyplot as plt
np.random.seed(42)
# #############################################################################
# Generate sample data
X = np.sort(5 * np.random.rand(40, 1), axis=0)
y = np.sin(X).ravel()
# #############################################################################
# Add noise to targets
y[::5] += 3 * (0.5 - np.random.rand(8))
# #############################################################################
# Fit regression model
svr_lin = SVR(kernel='linear', C=1, tol=1e-5)
svr_lins = LinearSVR(loss='squared_epsilon_insensitive', C=1, tol=1e-5, random_state=42)
svr_poly = SVR(kernel='poly', C=1, degree=1, gamma=1, tol=1e-5, coef0=0.0)
ridge = Ridge(alpha=1, random_state=42)
y_lin = svr_lin.fit(X, y).predict(X)
y_lins = svr_lins.fit(X, y).predict(X)
y_poly = svr_poly.fit(X, y).predict(X)
y_ridge = ridge.fit(X, y).predict(X)
coef_y_lin, intercept_y_lin = svr_lin.coef_, svr_lin.intercept_
coef_y_lins, intercept_y_lins = svr_lins.coef_, svr_lins.intercept_
coef_y_ridge, intercept_y_ridge = ridge.coef_, ridge.intercept_
# #############################################################################
# Look at the results
lw = 2
plt.figure(figsize=(10,5))
plt.scatter(X, y, color='darkorange', label='data')
plt.plot(X, y_lins, color='navy', lw=lw, label='Linear model (LinearSVR) %s, %s' %
(coef_y_lins, intercept_y_lins))
plt.plot(X, y_lin, color='red', lw=lw, label='Linear model (SVR) %s, %s' % (coef_y_lin, intercept_y_lin))
plt.plot(X, y_poly, color='cornflowerblue', lw=lw, label='Polynomial model of degree 1 (SVR)')
plt.plot(X, y_ridge, color='g', lw=lw, label='Ridge %s, %s' % (coef_y_ridge, intercept_y_ridge))
plt.xlabel('data')
plt.ylabel('target')
plt.title('Support Vector Regression')
plt.legend()
plt.axis([0, 5, -1, 1.5])
