Getting model attributes from pipeline

Question

I typically get PCA loadings like this:

pca = PCA(n_components=2)
X_t = pca.fit(X).transform(X)
loadings = pca.components_

If I run PCA using a scikit-learn pipeline:

from sklearn.pipeline import Pipeline
pipeline = Pipeline(steps=[    
('scaling',StandardScaler()),
('pca',PCA(n_components=2))
])
X_t=pipeline.fit_transform(X)

is it possible to get the loadings?

Simply trying loadings = pipeline.components_ fails:

AttributeError: 'Pipeline' object has no attribute 'components_'

(Also interested in extracting attributes like coef_ from pipelines.)

Andreas Mueller · Accepted Answer · 2021-02-20T01:21:15.470

110

Did you look at the documentation: http://scikit-learn.org/dev/modules/pipeline.html I feel it is pretty clear.

Update: in 0.21 you can use just square brackets:

pipeline['pca']

or indices

pipeline[1]

There are two ways to get to the steps in a pipeline, either using indices or using the string names you gave:

pipeline.named_steps['pca']
pipeline.steps[1][1]

This will give you the PCA object, on which you can get components. With named_steps you can also use attribute access with a . which allows autocompletion:

pipeline.names_steps.pca.<tab here gives autocomplete>

edited Feb 20 '21 at 01:21

answered Mar 03 '15 at 17:07

Andreas Mueller

27,470
8
62
74

1

Right, thanks. Didn't that (use of `named_steps`) in the [doc here](http://scikit-learn.org/dev/modules/generated/sklearn.pipeline.Pipeline.html#sklearn.pipeline.Pipeline). Appreciate that. – lmart999 Mar 03 '15 at 23:52
I would like to hijack this answer by adding that, if you have a `regr = TransformedTargetRegressor` over your pipeline then the syntax is not the same, instead you have to access the regressor using `regressor_` before you access the named steps i.e. `regr.regressor_.named_steps['pca'].components_`. – Ari Cooper-Davis Nov 11 '19 at 14:03
Wierd it isn't on the docs page but in the `user guide` present in that docs though. – agent18 Jan 11 '21 at 13:17
@agent18 Where was it missing? Maybe open an issue (or better yet a PR) to sklearn to update the docs :) – Andreas Mueller Feb 20 '21 at 01:22

Guillaume Chevalier · Answer 2 · 2019-10-30T07:38:25.503

Using Neuraxle

Working with pipelines is simpler using Neuraxle. For instance, you can do this:

from neuraxle.pipeline import Pipeline

# Create and fit the pipeline: 
pipeline = Pipeline([
    StandardScaler(),
    PCA(n_components=2)
])
pipeline, X_t = pipeline.fit_transform(X)

# Get the components: 
pca = pipeline[-1]
components = pca.components_

You can access your PCA these three different ways as wished:

pipeline['PCA']
pipeline[-1]
pipeline[1]

Neuraxle is a pipelining library built on top of scikit-learn to take pipelines to the next level. It allows easily managing spaces of hyperparameter distributions, nested pipelines, saving and reloading, REST API serving, and more. The whole thing is made to also use Deep Learning algorithms and to allow parallel computing.

Nested pipelines:

You could have pipelines within pipelines as below.

# Create and fit the pipeline: 
pipeline = Pipeline([
    StandardScaler(),
    Identity(),
    Pipeline([
        Identity(),  # Note: an Identity step is a step that does nothing. 
        Identity(),  # We use it here for demonstration purposes. 
        Identity(),
        Pipeline([
            Identity(),
            PCA(n_components=2)
        ])
    ])
])
pipeline, X_t = pipeline.fit_transform(X)

Then you'd need to do this:

# Get the components: 
pca = pipeline["Pipeline"]["Pipeline"][-1]
components = pca.components_

Getting model attributes from pipeline

2 Answers2

Using Neuraxle

Nested pipelines:

Linked

Related