I want to create sparse matrix for one hot encoded features from data frame df
. But I am getting memory issue for code given below. Shape of sparse_onehot
is (450138, 1508)
sp_features = ['id', 'video_id', 'genre']
sparse_onehot = pd.get_dummies(df[sp_features], columns = sp_features)
import scipy
X = scipy.sparse.csr_matrix(sparse_onehot.values)
I get memory error as shown below.
MemoryError: Unable to allocate 647. MiB for an array with shape (1508, 450138) and data type uint8
I have tried scipy.sparse.lil_matrix
and get same error as above.
Is there any efficient way of handling this? Thanks in advance