2

If I save a CSR matrix using numpy.save(), then try to load it in via numpy.load(), a huge number of properties disappear: in particular there is no shape, and it's not possible to access values by index. Is this normal?

In the example below I create a CSR matrix from three arrays: the data, the indices and the index pointers. I then save it, load it back, and demonstrate the failure of the shape and index operations on the saved version.

> import numpy as np
> import scipy as sp
> import scipy.sparse as ssp

> wd
Out[1]: 
array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], dtype=int16)

> wi
Out[1]: 
array([200003,      1, 200009, 300000, 200002, 200006, 200007, 250000,
       300500, 200010, 300501, 200001, 200000,      0, 200008, 200004,
       200005, 200011, 200018,      2, 200019, 200013, 300001, 200014,
       200015, 200022, 200012, 200020, 200021, 200016, 200017, 200023,
       200027,      2, 200030, 200032, 200028, 200033, 200031, 200029,
       200026, 200025, 200024, 200047,      2, 200042, 200045, 200046,
       200028, 200038, 200040, 200039, 200036, 200037, 200012, 200048,
       200041, 200035, 200044, 200043, 200034, 200049,      3, 200050,
            4], dtype=int32)

> wp
Out[1]: array([ 0, 18, 31, 43, 61, 65], dtype=int32)

> ww = ssp.csr_matrix((wd,wi,wp))

> ww.shape
Out[1]: (5, 300502)

> ww[2,3]
Out[1]: 0

> ww[0,0]
Out[1]: 1

> np.save('/Users/bryanfeeney/Desktop/ww.npy', ww)
> www = np.load('/Users/bryanfeeney/Desktop/ww.npy')

> www.shape
Out[1]: ()

> www[2,3]
Traceback (most recent call last):
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/IPython/core/interactiveshell.py", line 2732, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-1-35f1349fb755>", line 1, in <module>
    www[2,3]
IndexError: 0-d arrays can only use a single () or a list of newaxes (and a single ...) as an index

> www[0,0]
Traceback (most recent call last):
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/IPython/core/interactiveshell.py", line 2732, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-1-43c5da404060>", line 1, in <module>
    www[0,0]
IndexError: 0-d arrays can only use a single () or a list of newaxes (and a single ...) as an index

Here's the version information for the python runtime, numpy and scipy respectively.

> sys.version
Out[1]: '3.3.2 (default, May 21 2013, 11:50:47) \n[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))]'

> np.__version__
Out[1]: '1.7.1'

> sp.__version__
Out[1]: '0.12.0'
Feenaboccles
  • 412
  • 3
  • 12
  • In response to the comment that the docs state np.load/store works on arrays, that might be it, but is there not any clean way of storing and loading CSR matrices then? Thus far I've been using the trick of storing and loading the data, indices and indptr, which seems messy – Feenaboccles Aug 27 '13 at 18:52
  • Cross over comments and posts. I can only see the "messy" way. But someone else may answer – doctorlove Aug 27 '13 at 18:56
  • possible duplicate of [Save / load scipy sparse csr\_matrix in portable data format](http://stackoverflow.com/questions/8955448/save-load-scipy-sparse-csr-matrix-in-portable-data-format) – Saullo G. P. Castro Aug 27 '13 at 20:11
  • I was aware of that question already (see my comment about the "messy way" above). I want to know am I right in thinking numpy/scipy have no single method for saving or loading matrices, and moreover that numpy.save() is essentially buggy in that it accepts matrices it cannot save writing a file that loses information. – Feenaboccles Aug 28 '13 at 10:34

2 Answers2

0

The three variables wd, wi and wp make up your sparse matrix. You need to save all three of these, since numpy save deals with numpy arrays.
Then having loaded them, say as wwd, wwi and wwp make a new matrix

new_csr = csr_matrix((wwd, wwi, wwp), shape=(M, N))

See here for a similar discussion.

Community
  • 1
  • 1
doctorlove
  • 18,872
  • 2
  • 46
  • 62
0

This seems to be a bug, but you can pickle the whole sparse-matrix object:

import pickle
with open('ww.pkl', 'w') as f:
    pickle.dump(w, f)

and when you want to load:

with open('ww.pkl') as f:
    ww = pickle.load(f)
Saullo G. P. Castro
  • 56,802
  • 26
  • 179
  • 234
  • Since the convention is to use the 'npy' extension for the numpy format, and the file saved by `pickle.dump` is *not* using the numpy file format, I would suggest using a different filename extension when 'pickle' is used (e.g. 'pkl'). – Warren Weckesser Aug 27 '13 at 20:23