0

I am looking at this example

https://www.analyticsvidhya.com/blog/2019/04/predicting-movie-genres-nlp-multi-label-classification/

exactly at the line where using TF-IDF

# create TF-IDF features
xtrain_tfidf = tfidf_vectorizer.fit_transform(xtrain)
xval_tfidf = tfidf_vectorizer.transform(xval)

When i try to view the results of xtrain_tfidf I get this message

xtrain_tfidf
Out[69]: 
<33434x10000 sparse matrix of type '<class 'numpy.float64'>'
    with 3494870 stored elements in Compressed Sparse Row format>

I would like to see what does xtrain_tfidf have?

how can I view it?

asmgx
  • 7,328
  • 15
  • 82
  • 143
  • 1
    Where do you see the error? It's just telling you that the result is a sparse matrix – yatu Apr 23 '20 at 08:04
  • ok, how to view the results in TFIDF? – asmgx Apr 23 '20 at 08:08
  • The duplicate is related, but doesn't answer the question quite as asked. I find the answer here complimentary and worth having as an answer to this slightly different question. Voting to reopen. – joanis Apr 23 '20 at 14:46

1 Answers1

1

Jupyter (or rather IPython (or rather the Python REPL)) implicitly calls xtrain_tfidf.__repr__() when you evaluate the name of the variable. Using print calls xtrain_tfidf.__str__(), which is what you're looking for when you want to see the nonzero values in a sparse matrix:

print(xtrain_tfidf)

If you want to print everything including zero-values, slowness and possible out-of-memory be darned, then try

import numpy as np

with np.printoptions(threshold=np.inf):
    print(xtrain_tfidf.toarray())
jmkjaer
  • 1,019
  • 2
  • 12
  • 29