0

I have two pandas sparse dataframes, big_sdf and bigger_sdf.

When I try to multiply them:

result = big_sdf @ bigger_sdf

I get an error:

"numpy.core._exceptions.MemoryError: Unable to allocate 3.6 TiB for an array with shape (160815, 3078149) and data type int64"

So I tried to convert these sparse dataframes to SciPy's csr matrices and multiply it, but the conversion doesn't succeed:

from scipy.sparse import csr_matrix
csr_big = csr_matrix(big_sdf)
csr_bigger = csr_matrix(bigger_sdf)

When I run the last row I get an error message:

"ValueError: unrecognized csr_matrix constructor usage"

It only happens for the bigger matrix, the smaller one is converted with success.

Any ideas? Maybe there's a Pandas native method to multiply sparse dataframes which I missed?

Thanks in advance!

AlonBA
  • 444
  • 1
  • 4
  • 18
  • `csr_matrix` does not "know" anything specific about 'sparse dataframes'. `csr_matrix(big_sdf)` first does `np.as_array(big_sdf)`, or effectively `big_sdf.to_numpy()`. Compare that for your two frames, and see what's different. – hpaulj Dec 12 '22 at 08:37
  • Actually, I should have first asked for FULL error message, or messages since you talk about two errors. – hpaulj Dec 12 '22 at 08:38
  • 1
    Provide a [mre]. Anyway, did you try ```big_sdf.sparse.to_coo() @ bigger_sdf.sparse.to_coo()```? – relent95 Dec 12 '22 at 12:32

0 Answers0