How to fix tuple object error in feature union & pipelines (while using sklearn)?

Question

I have a pandas data frame with 56 columns. Around half of the columns are float and the others are string(textual data) and finally col56 is the label column. The dataset looks something like this

Col1 Col2...Col26 Col27       Col 28   ..... Col55     Col 56
1    4      76    I like cats Cats are cool  Cat bags  1
.
.
.
1900 rows

I want to use both numeric and textual data to run classification algorithms. A quick google search told that the best way to proceed is by using Feature Union

This is the code so far

import pandas as pd
import numpy as np
from sklearn.preprocessing import FunctionTransformer
from sklearn.pipeline import FeatureUnion, Pipeline
from sklearn.svm import SVC
from sklearn.pipeline import FeatureUnion
from sklearn.feature_extraction.text import CountVectorizer

df=pd.read_csv('url')
X=df[[Col1...Col55]]
y=df[[Col56]]
from sklearn.model_selection import train_test_split
stop_list=(i, am, the...)
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.3,random_state=42)
pipeline = Pipeline([
    ('union',FeatureUnion([
        ('Col1', Pipeline([
            ('selector', ItemSelector(column='Col1')),
            ('caster', ArrayCaster())
            ])),
.
.
.
.
.
        ('Col27',Pipeline([
            ('selector', ItemSelector(column='Col27')),
            ('vectorizer', CountVectorizer())
            ])), 
.
.
. 
        ('Col55',Pipeline([
            ('selector', ItemSelector(column='Col55')),
            ('vectorizer', CountVectorizer())
            ]))
])),
('model',SVC())
])

Then I get an error

TypeError                                 Traceback (most recent call last)
<ipython-input-8-7a2cab7bed7d> in <module>
    167         (' Col27',Pipeline([
    168             ('selector', ItemSelector(column=' Col27')),
--> 169             ('vectorizer', CountVectorizer(stop_words=stop_list))
    170         ]))

TypeError: 'tuple' object is not callable

I don't understand since the exact same method is used here and here And there doesn't seem any error. What am I doing wrong? How can I fix this?

The code you showed does not match with the error. Please post the complete and correct code. Have you properly closed each column pipeline with `]))`? — Vivek Kumar, Jan 07 '19 at 07:41
I am talking about the internal `Pipeline`s. Look at the end of `'Col1'` Pipeline. All your other pipelines should end this way, with `]))`. They are not shown here in code. — Vivek Kumar, Jan 07 '19 at 09:57
@VivekKumar Yes, all the columns end in the same way as 'Col1'. (Added this part to the code to make it clear) — ratmatazz1123, Jan 07 '19 at 10:40
your error seems to be with stop_list, can you post your complete code in what you have for stop_list ? — Venkatachalam, Jan 09 '19 at 04:32
@AI_Learning I have tried deleting stop_list. it does not seem to solve the problem. — ratmatazz1123, Jan 09 '19 at 11:27

score 0 · Answer 1 · answered May 13 '20 at 04:46

I think the issue is with CountVectorizer.

    cv = CountVectorizer
    word_count_vector = cv.fit_transform(data)
    word_count_vector = cv.shape()

This produces the same error as you. You could actually do the stuff manually. Use CountVectorizer to create a sparse matrix of the data and align it with your numerical data matrix or dataframe by using spare.hstack from scipy. It horizontally stacks the two matrices with equal rows and equal/different columns.

How to fix tuple object error in feature union & pipelines (while using sklearn)?

1 Answers1