I'm struggling on creating CountVectorizer model on a text dataframe that I have. The dataframe contains 4 columns with a relatively long text. For example:
Description Comments Summary System Log
text text text text text text text text text text text text
I created this function that work well on each column separately, but I can't figure out how to do the same for the all df together:
vectorizer = CountVectorizer(max_features=1500, max_df = 0.90, min_df = 0.05)
X = vectorizer.fit_transform(df).toarray()
tfidfconverter = TfidfTransformer()
X = tfidfconverter.fit_transform(X).toarray()
df = pd.DataFrame(X, columns = vectorizer.get_feature_names())
return df
The output that I'm looking to get is a df that will look something like this:
able above abpwrk accessor according action activity actual without
0 0.0 0.0 0.0 0.00000 0.0 0.000000 0.0 0.000000 0.000000
1 0.0 0.0 0.0 0.07126 0.0 0.249390 0.0 0.000000 0.000000
It works if I'm merging all the columns into one column of text, but something tells me there must be a smarter solution. Any idea?