I have a 51K X 8.5K data frame with just binary (1 or 0) values.
I wrote the following code:
Pickling the data to the disk
outfile=open("df_preference.p", "wb")
pickle.dump(df_preference,outfile)
outfile.close()
It throws me Memory Error as below:
MemoryError Traceback (most recent call last)
<ipython-input-48-de66e880aacb> in <module>()
2
3 outfile=open("df_preference.p", "wb")
----> 4 pickle.dump(df_preference,outfile)
5 outfile.close()
I am assuming it means this data is huge and it can't be pickled? But it just has binary values.
Before this, I created this dataset from another data frame which had normal counts and lot of zeros. Used the following code:
df_preference=df_recommender.applymap(lambda x: np.where(x >0, 1, 0))
This itself took some time to create df_preference. Same size of matrix.
My concern is if it takes time to create a data frame using applymap and ii) doesn't even pickle the data frame due to memory error, then going ahead I need to do matrix factorization of this df_prefence using SVD and Alternating Least Squares. It would then be more slow? How to tackle this slow run and solve the memory error?
Thanks