I'm writing a machine learning algorithm on huge & sparse data (my matrix is of shape (347, 5 416 812 801) but very sparse, only 0.13% of the data is non zero.
My sparse matrix's size is 105 000 bytes (<1Mbytes) and is of csr
type.
I'm trying to separate train/test sets by choosing a list of examples indices for each. So I want to split my dataset in two using :
training_set = matrix[train_indices]
of shape (len(training_indices), 5 416 812 801)
, still sparse
testing_set = matrix[test_indices]
of shape (347-len(training_indices), 5 416 812 801)
also sparse
With training_indices
and testing_indices
two list
of int
But training_set = matrix[train_indices]
seems to fail and return a Segmentation fault (core dumped)
It might not be a problem of memory, as I'm running this code on a server with 64Gbytes of RAM.
Any clue on what could be the cause ?