This is a very straightforward question, but I couldn't find the answer anywhere. I tried Google, TDS, Analytics Vidhya, StackOverflow, etc... so, here's the thing, I'm using Scikit-Learn Pipelines, but I wanted to see how my data was treated by the Pipeline. I mean, let's say I had missing values and now it's filled. I wanted to see the data filled, I want to see the dummies generated by the encoder and so on
1 Answers
There is no generic solution for such inspection since a pipeline can be composed of very different steps with very different data processing steps, like imputation, vectorization, feature encoding, and so forth. As a result, there might be very different information available for each step.
Therefore, I suppose the best approach is to inspect each step separately by the attributes that will be exposed after the transformers are fitted or dedicated methods of the transformer to retrieve information.
Let's say you have the following data and pipeline:
from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder
from sklearn.pipeline importPipeline
import numpy as np
X = [['Male', 1, 7], ['Female', 3, 5], ['Female', 2, 12], [np.nan, 2, 4], ['Male', np.nan, 15]]
pipeline = Pipeline(steps=[
('imputation', ColumnTransformer(transformers=[
('categorical', SimpleImputer(strategy='constant', fill_value='Missing'), [0]),
('numeric', SimpleImputer(strategy='mean'), [1, 2])
])),
('encoding', OneHotEncoder(handle_unknown='ignore'))
])
Xt = pipeline.fit_transform(X)
Then it might be best to check the attributes of the specific steps:
>>> print(pipeline['imputation'].transformers_[1][1].statistics_) # computed mean for features 1 and 2
[2. 8.6]
>>> print(pipeline['encoding'].get_feature_names()) # names of encoded categories
[... 'x2_Female' 'x2_Male' 'x2_Missing']
This of course assumes that you know how your pipeline is composed and what attributes each step will expose after fitting and which other methods it offers (for which the documentation of scikit-learn
is the best place to look for).

- 4,774
- 2
- 10
- 27