Questions tagged [scikit-learn-pipeline]
92 questions
43
votes
4 answers
Invalid parameter for sklearn estimator pipeline
I am implementing an example from the O'Reilly book "Introduction to Machine Learning with Python", using Python 2.7 and sklearn 0.16.
The code I am using:
pipe = make_pipeline(TfidfVectorizer(), LogisticRegression())
param_grid =…

sudo_coffee
- 888
- 1
- 12
- 26
41
votes
4 answers
return coefficients from Pipeline object in sklearn
I've fit a Pipeline object with RandomizedSearchCV
pipe_sgd = Pipeline([('scl', StandardScaler()),
('clf', SGDClassifier(n_jobs=-1))])
param_dist_sgd = {'clf__loss': ['log'],
'clf__penalty': [None, 'l1', 'l2',…

spies006
- 2,867
- 2
- 19
- 28
25
votes
2 answers
Is it possible to toggle a certain step in sklearn pipeline?
I wonder if we can set up an "optional" step in sklearn.pipeline. For example, for a classification problem, I may want to try an ExtraTreesClassifier with AND without a PCA transformation ahead of it. In practice, it might be a pipeline with an…

dolaameng
- 1,397
- 2
- 17
- 24
10
votes
3 answers
How to gridsearch over transform arguments within a pipeline in scikit-learn
My goal is to use one model to select the most important variables and another model to use those variables to make predictions. In the example below I am using two instances of RandomForestClassifier, but the second model could be any other…

Jason Sanchez
- 477
- 2
- 6
- 19
5
votes
2 answers
How to create pandas output for custom transformers?
There are a lot of changes in scikit-learn 1.2.0 where it supports pandas output for all of the transformers but how can I use it in a custom transformer?
In [1]: Here is my custom transformer which is a standard scaler:
from sklearn.base import…

Armando Bridena
- 237
- 3
- 10
5
votes
2 answers
Sklearn Pipeline: How to build for kmeans, clustering text?
I have text as shown :
list1 = ["My name is xyz", "My name is pqr", "I work in abc"]
The above will be training set for clustering text using kmeans.
list2 = ["My name is xyz", "I work in abc"]
The above is my test set.
I have built a vectorizer…

user1452759
- 8,810
- 15
- 42
- 58
4
votes
1 answer
How can I check the changes made by Scikit-Learn Pipeline?
This is a very straightforward question, but I couldn't find the answer anywhere. I tried Google, TDS, Analytics Vidhya, StackOverflow, etc... so, here's the thing, I'm using Scikit-Learn Pipelines, but I wanted to see how my data was treated by the…

Yuxxxxxx
- 203
- 1
- 5
3
votes
0 answers
How to use different feature set on for each estimator in a Multi estimator sklearn pipeline
Below is an example sklearn pipeline. There are two sklearn StackingClassifiers:
stackingclassifier1 with base classifier as RandomForestClassifier & stackingclassifier2 as Meta Learner.
stackingclassifier2 with base classifier as…

Jyoti Hassanandani
- 91
- 5
3
votes
1 answer
SimpleImputer object has no attribute _fit_dtype
I have a trained scikit-learn model pipeline (including a SimpleImputer) that I'm trying to put into production. However, I get the following error when running it in the production environment.
SimpleImputer object has no attribute _fit_dtype
How…

Jakob
- 663
- 7
- 25
3
votes
1 answer
How can I get features names when there is a preprocessor before feature selection?
I tried checking some posts like this, this and this but I still couldn't find what I need.
These are the transformations I'm doing:
cat_transformer = Pipeline(steps=[("encoder", TargetEncoder())])
num_transformer = Pipeline(
steps=[
…

dsbr__0
- 241
- 1
- 3
3
votes
0 answers
How to fit Sklearn Pipeline on Catboost Classifier with Embedding features
I have a Catboost Classifier that predicts on some embedding features, and AFAIK these embedding features can only be specified through Pools (meaning I have to create a pool and then pass the pool for the Catboost classifier's .fit method in order…

Edouard Malet
- 51
- 1
3
votes
1 answer
How to train an sklearn pipeline in AWS?
Working within a Sagemaker Jupyter Notebook I have an XGBoost pipeline which transforms my data and also runs some feature selection:
steps_xgb = [('scaler', MinMaxScaler()),
('feature_reduction', SelectKBest(mutual_info_classif)),
…

quantumofnolace
- 125
- 7
2
votes
2 answers
How can I use sklearn's make_column_selector to select all valid datetime columns?
I want to select columns based on their datetime data types. My DataFrame has for example columns with types np.dtype('datetime64[ns]'), np.datetime64 and 'datetime64[ns, UTC]'.
Is there a generic way to select all columns with a datetime…

JAdel
- 1,309
- 1
- 7
- 24
2
votes
2 answers
Error finding attribute `feature_names_in_` that exists in docs
I'm getting the error AttributeError: 'LogisticRegression' object has no attribute 'feature_names_in_' even though that attribute is written in the docs.
I'm on scikit-learn version 1.0.2.
I created an object LogisticRegression and I am trying to…

sanderlin2013
- 31
- 6
2
votes
1 answer
How to preserve column names in scikit-learn ColumnTransformer?
I', creating some pipelines using scikit-learn but I'm having some trouble keeping the variables names as the original names, and not as the transformer_name__feature_name format
This is the scenario:
I have a set of transformers, both custom and…

Rodrigo A
- 657
- 7
- 23