0

When attempting to convert a sparse matrix to a numpy array (using the toarray function), I'm getting the following error:

array is too big; arr.size * arr.dtype.itemsize is larger than the maximum possible size.

The size (shape[0], shape[1]) of the SciPy matrix is 15,561, 22417.

Any ideas on how to work around this?

DannyMoshe
  • 6,023
  • 4
  • 31
  • 53
  • 1
    Sounds like the resulting array is/would be too large for the allocated memory on your machine. Can you process it in batches? – G. Anderson Oct 03 '18 at 15:43
  • I tried to loop through the matrix, this did not work either. Do i need more RAM? – DannyMoshe Oct 03 '18 at 15:46
  • What is `shape[1]` of the sparse matrix? – Warren Weckesser Oct 03 '18 at 15:48
  • In [this answer](https://stackoverflow.com/questions/14525344/whats-the-maximum-size-of-a-numpy-array) there are a few options for reducing memory size such as decreasing precision or increasing memory allocation. Maybe see if any of those solutions works for you, or see if there's another way to do what you need to do to your data – G. Anderson Oct 03 '18 at 15:49
  • @WarrenWeckesser its 22417 – DannyMoshe Oct 03 '18 at 15:55
  • Also, what is the `dtype` of your array? (I.e. `arr.dtype`) – Warren Weckesser Oct 03 '18 at 16:17
  • And what platform are you running this on? Windows? – Warren Weckesser Oct 03 '18 at 16:20
  • @WarrenWeckesser - Running windows. As far as type, not sure how to check type of a scipy sparse matrix ... – DannyMoshe Oct 03 '18 at 16:25
  • Check the `dtype` attribute. – Warren Weckesser Oct 03 '18 at 16:26
  • scipymatrix.dtype is "numpy.dtype" ... – DannyMoshe Oct 03 '18 at 16:29
  • That doesn't look right. If the sparse matrix is called, say, `m`, what is the output of `print(m.dtype)`? – Warren Weckesser Oct 03 '18 at 16:31
  • This is a common problem. The whole point to using `scipy.sparse` is be able to create and use matrices that would be too large if dense. Why do you need to convert the sparse matrix `toarray`? Changing the `dtype` might reduce the dense size by a factor of 2 or 4. Even if such an array fits you won't be able to do much without making a (temporary) copy or two (and hitting the memory error again). – hpaulj Oct 03 '18 at 16:44
  • @hpaulj I want to convert it to a Pandas dataframe – DannyMoshe Oct 03 '18 at 16:49
  • A sparse dataframe, or a massive one with mostly 0s? What does this sparse matrix represent? – hpaulj Oct 03 '18 at 17:28
  • @hpaulj, it represents word counts. I think the issue is I'm converting the entire matrix into a dataframe then aggregating (word count sum). I should probably aggregate, then convert. – DannyMoshe Oct 04 '18 at 08:28

0 Answers0