Difference between ridge regression and SVM regressor (SVR) with polynomial kernel of degree = 1

Question

I am trying to build a model for an application, I have used both the ridge regression and the SVR from sklearn and they seen to be different although I tried to keep the parameters the same.

I have used the regularization parameter to be = 1 in both models. (they both have L2 regularization) There is an extra parameter for the poly kernel which I set to zero

The data are standardized.

from sklearn.linear_model import Ridge

linear_ridge = Ridge(alpha=1.0) # L2 regularization
linear_ridge.fit(np.array(X_train) , np.array(y_train))

from sklearn import svm

model_SVR_poly = svm.SVR(kernel = 'poly' , coef0=0.0 , degree = 1, C = 1.0 , epsilon = 0.1 ) #L2 regularization
model_SVR_poly.fit(np.array(X_train) , np.array(y_train))


Linear_ridge_pred = linear_ridge.predict(test_data[start_data:]) *Y_std[0] + Y_mean[0]
svr_poly_pred =  model_SVR_poly.predict(test_data[start_data:]) *Y_std[0] + Y_mean[0]

If the value of epsilon is decreased , to 0.0 it will undershoot more than the ridge and if increased, it will overshoot more.

In the testing phase, the Ridge seems to undershoot while the SVR seems to overshoot.

What is the difference between the two implementations in my case or in general ?

score 1 · Answer 1 · answered Feb 10 '21 at 13:27

For me, there might be some differences in the implementations of Ridge() and SVR() as you are pointing out.

On one side, there's a difference in the loss function as you might see here (epsilon-insensitive loss and squared epsilon-insensitive loss) vs here (Ridge loss). This is emphasized also within this example from sklearn documentation which however compares Kernel Ridge Regression and SVR with a non-linear kernel.

In addition to this, the fact you're using SVR with a polynomial Kernel of degree 1 adds a further difference: as you can see here and here (SVR is built on top of the LibSVM library) there's a further parameter (gamma) to be considered (you might put it equal to 1 for convenience, it equals 'scale' by default).

Here is the difference in fitting that I could get by adjusting this toy example (with non-tuned parameters). I've also tried to consider LinearSVR() that has some further differences wrt SVR() as you can see eg here or here.

print(__doc__)

import numpy as np
from sklearn.linear_model import Ridge
from sklearn.svm import LinearSVR, SVR
import matplotlib.pyplot as plt
np.random.seed(42)

# #############################################################################
# Generate sample data
X = np.sort(5 * np.random.rand(40, 1), axis=0)
y = np.sin(X).ravel()

# #############################################################################
# Add noise to targets
y[::5] += 3 * (0.5 - np.random.rand(8))

# #############################################################################
# Fit regression model
svr_lin = SVR(kernel='linear', C=1, tol=1e-5)
svr_lins = LinearSVR(loss='squared_epsilon_insensitive', C=1, tol=1e-5, random_state=42)
svr_poly = SVR(kernel='poly', C=1, degree=1, gamma=1, tol=1e-5, coef0=0.0)
ridge = Ridge(alpha=1, random_state=42)
y_lin = svr_lin.fit(X, y).predict(X)
y_lins = svr_lins.fit(X, y).predict(X)
y_poly = svr_poly.fit(X, y).predict(X)
y_ridge = ridge.fit(X, y).predict(X)

coef_y_lin, intercept_y_lin = svr_lin.coef_, svr_lin.intercept_
coef_y_lins, intercept_y_lins = svr_lins.coef_, svr_lins.intercept_
coef_y_ridge, intercept_y_ridge = ridge.coef_, ridge.intercept_

# #############################################################################
# Look at the results
lw = 2
plt.figure(figsize=(10,5))
plt.scatter(X, y, color='darkorange', label='data')
plt.plot(X, y_lins, color='navy', lw=lw, label='Linear model (LinearSVR) %s, %s' % 
(coef_y_lins, intercept_y_lins))
plt.plot(X, y_lin, color='red', lw=lw, label='Linear model (SVR) %s, %s' % (coef_y_lin, intercept_y_lin))
plt.plot(X, y_poly, color='cornflowerblue', lw=lw, label='Polynomial model of degree 1 (SVR)')
plt.plot(X, y_ridge, color='g', lw=lw, label='Ridge %s, %s' % (coef_y_ridge, intercept_y_ridge))
plt.xlabel('data')
plt.ylabel('target')
plt.title('Support Vector Regression')
plt.legend()
plt.axis([0, 5, -1, 1.5])

Difference between ridge regression and SVM regressor (SVR) with polynomial kernel of degree = 1

1 Answers1