0

I want to get keywords of a text by tfidf method with sklenrn

I have got tfidf module, see code below:

from sklearn.feature_extraction import text
tfidf_vect = text.TfidfVectorizer()
texts = get_text_list()
tfidf = tfidf_vect.fit_transform(texts)

now , inputting a new documents as text

res = tfidf_vect.transform(text)

The res is a csr_matrix. The res.indices is positions of words and res.data is tfidf value.

How to sort this res by res.data

reference:http://www.cs.duke.edu/courses/spring14/compsci290/assignments/lab02.html

Fred Foo
  • 355,277
  • 75
  • 744
  • 836
maoyang
  • 1,067
  • 1
  • 11
  • 11
  • Is the goal to sort it by descending order of TF-IDF scores? `tfidf_vect.get_feature_names()` will give you the word for each position in the TF-IDF vectors. – confuser Jun 27 '14 at 23:53
  • @confuser, Thanks. Yes, I want to sort it by descending order of TF-IDF scores. Do you know the method? – maoyang Jun 28 '14 at 04:41
  • `res.toarray().argsort()` will return the indices in **ascending** order of TF-IDF score, and you can use those indices to look up the word list you get from `tfidf_vect.get_feature_names()`. Hope this helps! – confuser Jun 28 '14 at 18:06
  • @confuser, Thank you so much! It helps me a lot. http://stackoverflow.com/questions/16486252/is-it-possible-to-use-argsort-in-descending-order, refer it get descending order:) – maoyang Jun 28 '14 at 23:22

0 Answers0