0

Following my previous question I computed a code that creates a DTM. I would then need to make some calculations among columns and rows of my DTM. However, Python blocks when computing the last lineand is really impossible to run the code (the whole pc blocks). How to make the process smoothier?

Here is the code I am running (of course, (texts) is extremely larger)

texts=['text1', 'text4', 'text2', 'text3'] (each text has already been stemmed and removed punctuation)

import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt
from sklearn.feature_extraction.text import CountVectorizer
import itertools 

merged = list(itertools.chain.from_iterable(texts))

vect = CountVectorizer(min_df=0., max_df=1.0)
X = vect.fit_transform(texts)
df = pd.DataFrame(X.toarray().transpose(), index = vect.get_feature_names())
Community
  • 1
  • 1
dnquixote
  • 47
  • 1
  • 6
  • Could you give us the shape of `X` ? – MMF Oct 08 '16 at 18:53
  • 1
    Probably the instruction needs too much mem. Open memory monitor before the execution. Unfortunately it can be a problem: when calculating large matrices (especially multidimensional) it can be not trivial to free space on already processed data because interpreter does not know which cells of matrix are not required more. You can evaluate on cloud with large amount of memory. And of course time of evaluating multidimensional matrix grows exponentially with number of dimensions. Many dimensions typically produce huge number of cells. – sergzach Oct 10 '16 at 12:17

0 Answers0