Questions tagged [scikit-learn-pipeline]

92 questions
43
votes
4 answers

Invalid parameter for sklearn estimator pipeline

I am implementing an example from the O'Reilly book "Introduction to Machine Learning with Python", using Python 2.7 and sklearn 0.16. The code I am using: pipe = make_pipeline(TfidfVectorizer(), LogisticRegression()) param_grid =…
sudo_coffee
  • 888
  • 1
  • 12
  • 26
41
votes
4 answers

return coefficients from Pipeline object in sklearn

I've fit a Pipeline object with RandomizedSearchCV pipe_sgd = Pipeline([('scl', StandardScaler()), ('clf', SGDClassifier(n_jobs=-1))]) param_dist_sgd = {'clf__loss': ['log'], 'clf__penalty': [None, 'l1', 'l2',…
25
votes
2 answers

Is it possible to toggle a certain step in sklearn pipeline?

I wonder if we can set up an "optional" step in sklearn.pipeline. For example, for a classification problem, I may want to try an ExtraTreesClassifier with AND without a PCA transformation ahead of it. In practice, it might be a pipeline with an…
dolaameng
  • 1,397
  • 2
  • 17
  • 24
10
votes
3 answers

How to gridsearch over transform arguments within a pipeline in scikit-learn

My goal is to use one model to select the most important variables and another model to use those variables to make predictions. In the example below I am using two instances of RandomForestClassifier, but the second model could be any other…
5
votes
2 answers

How to create pandas output for custom transformers?

There are a lot of changes in scikit-learn 1.2.0 where it supports pandas output for all of the transformers but how can I use it in a custom transformer? In [1]: Here is my custom transformer which is a standard scaler: from sklearn.base import…
5
votes
2 answers

Sklearn Pipeline: How to build for kmeans, clustering text?

I have text as shown : list1 = ["My name is xyz", "My name is pqr", "I work in abc"] The above will be training set for clustering text using kmeans. list2 = ["My name is xyz", "I work in abc"] The above is my test set. I have built a vectorizer…
4
votes
1 answer

How can I check the changes made by Scikit-Learn Pipeline?

This is a very straightforward question, but I couldn't find the answer anywhere. I tried Google, TDS, Analytics Vidhya, StackOverflow, etc... so, here's the thing, I'm using Scikit-Learn Pipelines, but I wanted to see how my data was treated by the…
Yuxxxxxx
  • 203
  • 1
  • 5
3
votes
0 answers

How to use different feature set on for each estimator in a Multi estimator sklearn pipeline

Below is an example sklearn pipeline. There are two sklearn StackingClassifiers: stackingclassifier1 with base classifier as RandomForestClassifier & stackingclassifier2 as Meta Learner. stackingclassifier2 with base classifier as…
3
votes
1 answer

SimpleImputer object has no attribute _fit_dtype

I have a trained scikit-learn model pipeline (including a SimpleImputer) that I'm trying to put into production. However, I get the following error when running it in the production environment. SimpleImputer object has no attribute _fit_dtype How…
3
votes
1 answer

How can I get features names when there is a preprocessor before feature selection?

I tried checking some posts like this, this and this but I still couldn't find what I need. These are the transformations I'm doing: cat_transformer = Pipeline(steps=[("encoder", TargetEncoder())]) num_transformer = Pipeline( steps=[ …
3
votes
0 answers

How to fit Sklearn Pipeline on Catboost Classifier with Embedding features

I have a Catboost Classifier that predicts on some embedding features, and AFAIK these embedding features can only be specified through Pools (meaning I have to create a pool and then pass the pool for the Catboost classifier's .fit method in order…
3
votes
1 answer

How to train an sklearn pipeline in AWS?

Working within a Sagemaker Jupyter Notebook I have an XGBoost pipeline which transforms my data and also runs some feature selection: steps_xgb = [('scaler', MinMaxScaler()), ('feature_reduction', SelectKBest(mutual_info_classif)), …
2
votes
2 answers

How can I use sklearn's make_column_selector to select all valid datetime columns?

I want to select columns based on their datetime data types. My DataFrame has for example columns with types np.dtype('datetime64[ns]'), np.datetime64 and 'datetime64[ns, UTC]'. Is there a generic way to select all columns with a datetime…
JAdel
  • 1,309
  • 1
  • 7
  • 24
2
votes
2 answers

Error finding attribute `feature_names_in_` that exists in docs

I'm getting the error AttributeError: 'LogisticRegression' object has no attribute 'feature_names_in_' even though that attribute is written in the docs. I'm on scikit-learn version 1.0.2. I created an object LogisticRegression and I am trying to…
2
votes
1 answer

How to preserve column names in scikit-learn ColumnTransformer?

I', creating some pipelines using scikit-learn but I'm having some trouble keeping the variables names as the original names, and not as the transformer_name__feature_name format This is the scenario: I have a set of transformers, both custom and…
Rodrigo A
  • 657
  • 7
  • 23
1
2 3 4 5 6 7