How to view TF-IDF results?

Question

I am looking at this example

https://www.analyticsvidhya.com/blog/2019/04/predicting-movie-genres-nlp-multi-label-classification/

exactly at the line where using TF-IDF

# create TF-IDF features
xtrain_tfidf = tfidf_vectorizer.fit_transform(xtrain)
xval_tfidf = tfidf_vectorizer.transform(xval)

When i try to view the results of xtrain_tfidf I get this message

xtrain_tfidf
Out[69]: 
<33434x10000 sparse matrix of type '<class 'numpy.float64'>'
    with 3494870 stored elements in Compressed Sparse Row format>

I would like to see what does xtrain_tfidf have?

how can I view it?

Where do you see the error? It's just telling you that the result is a sparse matrix — yatu, Apr 23 '20 at 08:04
The duplicate is related, but doesn't answer the question quite as asked. I find the answer here complimentary and worth having as an answer to this slightly different question. Voting to reopen. — joanis, Apr 23 '20 at 14:46

jmkjaer · Accepted Answer · 2020-04-23T09:08:59.167

Jupyter (or rather IPython (or rather the Python REPL)) implicitly calls xtrain_tfidf.__repr__() when you evaluate the name of the variable. Using print calls xtrain_tfidf.__str__(), which is what you're looking for when you want to see the nonzero values in a sparse matrix:

print(xtrain_tfidf)

If you want to print everything including zero-values, slowness and possible out-of-memory be darned, then try

import numpy as np

with np.printoptions(threshold=np.inf):
    print(xtrain_tfidf.toarray())

How to view TF-IDF results?

1 Answers1