first of all, I am new to python and nlp / machine learning. right now I have the following code:
vectorizer = CountVectorizer(
input="content",
decode_error="ignore",
strip_accents=None,
stop_words = stopwords.words('english'),
tokenizer=myTokenizer
)
counts = vectorizer.fit_transform(data['message'].values)
classifier = MultinomialNB()
targets = data['sentiment'].values
classifier.fit(counts, targets)
now this actually works pretty well. I am getting a sparse matrix through the CountVectorizer
and the classifier
makes use of the matrix as well as the targets (0,2,4)
.
However, what would I have to do if I wanted to use more features in the vector instead of just the word counts? I can't seem to find that out. Thank you in advance.