I'm working with relatively large high-dimensional sparse arrays using scipy.sparse
.
The actual data and row/column indices are no issue to store.
The problem is I end up up with things like
sp.csr_matrix(([1], ([0], [0])), shape=(int(1e14), 1)).shape
which gives
MemoryError: Unable to allocate 728. TiB for an array with shape (100000000000001,) and data type int64
since it looks like scipy
tries to allocate row/column masks (or something?)
Is there a good workaround for this? Would using coo_matrix
fix it?
Update
It turns out I'm just an idiot and should have paid better attention to whether I was using a CSC matrix or CSR matrix.
CSC will compress the rows. CSR will compress the columns. For data stored like this (in what I am sure is a terrible format for making use of sparsity), CSC will work way better.
In any case, both of these work fine
wat = sp.csc_matrix(([1], ([0], [0])), shape=(int(1e14), 1))
wat2 = sp.csr_matrix(([1], ([0], [0])), shape=(1, int(1e14)))
and this is just a misunderstanding of what CSC and CSR do for us