I am trying to implement the SelectKBest algorithm on my data to get the best features out of it. For this I am first preprocessing my data using DictVectorizer and the data consists of 1061427 rows with 15 features. Each feature has many different values and I believe I am getting a memory error due to high cardinality.
I get the following error:
File "FeatureExtraction.py", line 30, in <module>
quote_data = DV.fit_transform(quote_data).toarray()
File "/usr/lib64/python2.6/site-packages/scipy/sparse/compressed.py", line 563, in toarray
return self.tocoo(copy=False).toarray()
File "/usr/lib64/python2.6/site-packages/scipy/sparse/coo.py", line 233, in toarray
B = np.zeros(self.shape, dtype=self.dtype)
MemoryError
Is there any alternate way that I could do this? Why do I get a memory error when I am processing on a machine that has 256GB of RAM.
Any Help is appreciated!