0

how can i associate my tfidf matrix with a category ? for example i have the below data set

**ID**        **Text**                                     **Category**
   1     jake loves me more than john loves me               Romance
   2     july likes me more than robert loves me             Friendship
   3     He likes videogames more than baseball              Interest 

once i calculate tfidf for each and every sentence by taking 'Text' column as my input, how would i be able to train the system to categorize that row of the matrix to be associated with my category above so that i would be able to reuse for my test data ?

using the above train dataset , when i pass a new sentence 'julie is a lovely person', i would like that sentence to be categorized into single or multiple pre-defined categories as above.

I have used this link Keep TFIDF result for predicting new content using Scikit for Python as my starting point to solve this issue but i was not able to understand on how to map tfidf matrix for a sentence to a category

Community
  • 1
  • 1
RData
  • 959
  • 1
  • 13
  • 33

1 Answers1

1

It looks like you already vectorised the text, i.e. already converted the text to numbers so that you can use scinkit-learns classifiers. Now the next step is to train a classifier. You can follow this link. It looks like this:

Vectorization

from sklearn.feature_extraction.text import CountVectorizer
count_vect = CountVectorizer()
X_train = count_vect.fit_transform(your_text)

Train classifier

from sklearn.naive_bayes import MultinomialNB
clf = MultinomialNB().fit(X_train, y_train)

Predict on new docs:

docs_new = ['God is love', 'OpenGL on the GPU is fast']
X_new = count_vect.transform(docs_new)
predicted = clf.predict(X_new)
elyase
  • 39,479
  • 12
  • 112
  • 119
  • Yes, i have already converted text to numbers but how would the system know to which category that numbers belong to ? i have transformed text to number but i was not able to tag the numbers of the text to a category which i would like to do (as shown in my question dataset). – RData Jun 07 '16 at 12:32
  • Thats what the classifier and prediction step do. the predicted variable will have the categories for new text. – elyase Jun 07 '16 at 12:48
  • is y_train my category ? – RData Jun 07 '16 at 12:49