How to map TFIDF values to original words

Asked May 13 '19 at 17:33

Active May 13 '19 at 17:45

Viewed 241 times

I've followed this example for computing TFIDF of each word in my documents. However, my final output looks something like this (which is obviously okay since I am using HashingTF):

(262144,[24856,31066,96984,119418,143328,176968,193347,223999,243191,245270,250475],[2.3513752571634776,1.9459101490553132,1.9459101490553132,2.3513752571634776,1.4350845252893227,2.3513752571634776,2.3513752571634776,1.9459101490553132,3.8918202981106265,1.9459101490553132,2.3513752571634776])
(262144,[21028,31066,71524,72609,116873,140075,142830,155149,222394,226568,245044],[1.9459101490553132,1.9459101490553132,1.6582280766035324,2.3513752571634776,2.3513752571634776,1.9459101490553132,1.9459101490553132,2.3513752571634776,1.9459101490553132,1.252762968495368,1.9459101490553132])

Does there exist any API which matches word to its TFIDF value, please?

edited May 13 '19 at 17:45

asked May 13 '19 at 17:33

scarface

1

Possible duplicate of [How to get word details from TF Vector RDD in Spark ML Lib?](https://stackoverflow.com/questions/32285699/how-to-get-word-details-from-tf-vector-rdd-in-spark-ml-lib) – user10938362 May 13 '19 at 18:17
@user10938362 this probably may help. Thanks, will try. – scarface May 15 '19 at 11:58

How to map TFIDF values to original words

0 Answers0