Standard deviation of predictive distribution of query points using a pipeline

Question

I am trying to do a simple regression task using a pipeline to assign the degree of the polynomial used for the regression (degree = 3). So I define:

pipe = make_pipeline(PolynomialFeatures(3), BayesianRidge())

And then the fitting:

pipe.fit(X_train, y_train)

And finally the prediction bit:

y_pred = pipe.predict(X_test)

BayesianRidge() of the sklearn has a return_std parameter for its predict method that when set to True, it returns standard deviation of predictive distribution of query points.

Is there anyway that I can get this standard deviation array using a pipeline?

score 1 · Accepted Answer · answered Oct 25 '17 at 10:00

You need to install the latest version of scikit-learn from their github repository. Next you would simply need to use partial from functools. I have used the example similar to the one mentioned in Bayesian Ridge Regression docs.

from sklearn import linear_model
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline
from functools import partial

clf = linear_model.BayesianRidge()

#Make the pipeline
pipe = make_pipeline(PolynomialFeatures(3), clf)

#Patch the predict function of the classifier using partial
clf.predict = partial(clf.predict,return_std=True )

#Fit the pipeline
pipe.fit([[0,0], [1, 1], [2, 2]], [0, 1, 2])

#Retrieve the prediction and standard deviation
y_pred, y_std = pipe.predict([[1,2]])
#Output : (array([ 1.547614]), array([ 0.25034696]))

Note : Apparently this was a bug in sklearn's pipeline module as described here. It is now fixed in the latest version.

Reference:

How partial works in Python

Standard deviation of predictive distribution of query points using a pipeline

1 Answers1