I'm new to scikit-learn, and currently studying Naïve Bayes (Multinomial). Right now, I'm working on vectorizing text from sklearn.feature_extraction.text, and for some reason, when I vectorize some text, the word "I" doesn't show up in the outputted array.
Code:
x_train = ['I am a Nigerian hacker', 'I like puppies']
# convert x_train to vectorized text
vectorizer_train = CountVectorizer(min_df=0)
vectorizer_train.fit(x_train)
x_train_array = vectorizer_train.transform(x_train).toarray()
# print vectorized text, feature names
print x_train_array
print vectorizer_train.get_feature_names()
Output:
1 1 0 1 0
0 0 1 0 1
[u'am', u'hacker', u'like', u'nigerian', u'puppies']
Why doesn't "I" seem to show up in the feature names? When I change it to "Ia" or something else like that, it does show up.