I have created pre-processed data. Now, I would like to vectorize it and write it on a text file. While transforming vectorizer object to array, I get this error. What could be possible solutions?
from sklearn.feature_extraction.text import CountVectorizer
import numpy as np
vectorizer = CountVectorizer(analyzer = "word", \
tokenizer = None, \
preprocessor = None, \
stop_words = None, \
max_features = 1000)
newTestFile = open("testfile.txt", 'r', encoding='latin-1')
featureVector=vectorizer.fit_transform(newTestFile)
train_data_features = featureVector.toarray()
np.savetxt('plotFeatureVector.txt', train_data_features, fmt="%10s %10.3f")
The error:
Traceback (most recent call last):
File "C:/Users/NuMA/Desktop/Lecture Stuff/EE 485/Project/Deneme/bagOfWords.py", line 12, in <module>
train_data_features = featureVector.toarray()
File "C:\Users\NuMA\AppData\Local\Programs\Python\Python35-32\lib\site-packages\scipy\sparse\compressed.py", line 964, in toarray
return self.tocoo(copy=False).toarray(order=order, out=out)
File "C:\Users\NuMA\AppData\Local\Programs\Python\Python35-32\lib\site-packages\scipy\sparse\coo.py", line 252, in toarray
B = self._process_toarray_args(order, out)
File "C:\Users\NuMA\AppData\Local\Programs\Python\Python35-32\lib\site-packages\scipy\sparse\base.py", line 1039, in _process_toarray_args
return np.zeros(self.shape, dtype=self.dtype, order=order)
ValueError: array is too big; `arr.size * arr.dtype.itemsize` is larger than the maximum possible size.