1

I have scipy.sparse.csr.csr_matrix object size of (118723, 20748) when I am trying to convert it into an array by using toarray() function it giving me *** MemoryError:

Please find below my code:

tfidf = TfidfVectorizer(min_df=5,stop_words='english')
X = tfidf.fit_transform(sentance_list)
sentanceDF=pd.DataFrame(X.toarray(),columns=tfidf.get_feature_names())

Error:

*** MemoryError: 

Please help me to resole this issue.

Vikram Singh Chandel
  • 1,290
  • 2
  • 17
  • 36
  • That will need > 18 GB memory. – sascha Jan 31 '18 at 12:39
  • @sascha Is there any other way to do it. – Vikram Singh Chandel Jan 31 '18 at 13:00
  • Keep it sparse. What's possible depends on pandas and your use-case. But a dense matrix of size (118723, 20748) will always need ```2463264804*x-bits``` of memory (e.g. x=64 four double). – sascha Jan 31 '18 at 13:02
  • try this? https://stackoverflow.com/questions/17818783/populate-a-pandas-sparsedataframe-from-a-scipy-sparse-matrix. Looks like a potential work around but I can't say for sure ow much RAM it will eat. – Dylan Jan 31 '18 at 16:33

0 Answers0