16

I'm using sklearn.pipeline.Pipeline to chain feature extractors and a classifier. Is there a way to combine multiple feature selection classes (for example the ones from sklearn.feature_selection.text) in parallel and join their output?

My code right now looks as follows:

pipeline = Pipeline([
    ('vect', CountVectorizer()),
    ('tfidf', TfidfTransformer()),
    ('clf', SGDClassifier())])

It results in the following:

vect -> tfidf -> clf

I want to be able to specify a pipeline that looks as follows:

vect1 -> tfidf1 \
                 -> clf
vect2 -> tfidf2 /
Gyan Veda
  • 6,309
  • 11
  • 41
  • 66
Daniel
  • 26,899
  • 12
  • 60
  • 88

1 Answers1

17

This has been implemented recently in the master branch of scikit-learn under the name FeatureUnion:

http://scikit-learn.org/dev/modules/pipeline.html#feature-union

ogrisel
  • 39,309
  • 12
  • 116
  • 125