42

I working on different shapes of arrays and I want to save them all with numpy.save, so, consider I have

mat1 = numpy.arange(8).reshape(4, 2)
mat2 = numpy.arange(9).reshape(2, 3)
numpy.save('mat.npy', numpy.array([mat1, mat2]))

It works. But when I have two matrices with one dimension of same size it's not working.

mat1 = numpy.arange(8).reshape(2, 4)
mat2 = numpy.arange(10).reshape(2, 5)
numpy.save('mat.npy', numpy.array([mat1, mat2]))

It causes
Traceback (most recent call last): File "<input>", line 1, in <module> ValueError: could not broadcast input array from shape (2,4) into shape (2)

And note that the problem caused by numpy.array([mat1, mat2]) and not by numpy.save

I know that such array is possible:

>> numpy.array([[[1, 2]], [[1, 2], [3, 4]]]) array([[[1, 2]], [[1, 2], [3, 4]]], dtype=object)

So, all of what I want is to save two arrays as mat1 and mat2 at once.

zardav
  • 1,160
  • 3
  • 12
  • 22
  • Have you considered using `np.savez` or pickle with a binary protocol instead? `savez` saves multiple arrays, `save` only saves a single array. – Joe Kington Feb 01 '16 at 14:48
  • It works on my computer. What version of python are you using? – CoMartel Feb 01 '16 at 14:51
  • If the first dimension of `mat1` and `mat2` are the same, `np.array(...)` produces this error. You can get around this error by initializing a `np.empty((2,),object)` array, and filling it with the element arrays. Also do that if all the dimensions are the same (to prevent concatenation). – hpaulj Feb 13 '20 at 02:35

2 Answers2

88

If you'd like to save multiple arrays in the same format as np.save, use np.savez.

For example:

import numpy as np

arr1 = np.arange(8).reshape(2, 4)
arr2 = np.arange(10).reshape(2, 5)
np.savez('mat.npz', name1=arr1, name2=arr2)

data = np.load('mat.npz')
print data['name1']
print data['name2']

If you have several arrays, you can expand the arguments:

import numpy as np

data = [np.arange(8).reshape(2, 4), np.arange(10).reshape(2, 5)]
np.savez('mat.npz', *data)

container = np.load('mat.npz')
data = [container[key] for key in container]

Note that the order is not preserved. If you do need to preserve order, you might consider using pickle instead.

If you use pickle, be sure to specify the binary protocol, otherwise the you'll write things using ascii pickle, which is particularly inefficient for numpy arrays. With a binary protocol, ndarrays more or less pickle to the same format as np.save/np.savez. For example:

# Note: This is Python2.x specific. It's identical except for the import on 3.x
import cPickle as pickle
import numpy as np

data = [np.arange(8).reshape(2, 4), np.arange(10).reshape(2, 5)]

with open('mat.pkl', 'wb') as outfile:
    pickle.dump(data, outfile, pickle.HIGHEST_PROTOCOL)

with open('mat.pkl', 'rb') as infile:
    result = pickle.load(infile)

In this case, result and data will have identical contents and the order of the input list of arrays will be preserved.

Joe Kington
  • 275,208
  • 71
  • 604
  • 463
  • 2
    Consider I have list of arrays and I want to save them all, and after that to load all. – zardav Feb 01 '16 at 14:57
  • Also... there is solution for the general problem I wrote? – zardav Feb 01 '16 at 15:05
  • @Dubon - I'm not quite sure what you're referring to by "the general problem". If you mean writing/reading arbitrary python objects to/from disk, `pickle` is what you're looking for. – Joe Kington Feb 01 '16 at 15:08
  • The general problem is that I can't do `numpy.array([arr1, arr2])` if `arr1.shape[0] = arr2.shape[0]` if I don't want broadcasting. – zardav Feb 01 '16 at 15:11
  • @Dubon - No, there's no way to make a numpy array have a different shape along a dimension. `ndarray`s have to have a consistent shape. (Internally, an `ndarray` is a memory buffer with a simple description of shape/strides/type. "Ragged" arrays can't be described that way.) If you want "ragged" arrays, use a list of arrays instead. A single array can't contain arrays of different shapes. (Well, excluding object arrays, but you _really_ don't want those.) – Joe Kington Feb 01 '16 at 15:13
  • 2
    @Dubon - You're getting an object array. It's not a "real" array in the same sense as the others. It's basically a very inefficient `list`. You're better off using a `list` instead of creating an object array. As you've noted, this particular result is a _1D_ array of other arrays. It won't broadcast like a 2D or 3D array because it's 1D. You also won't be able to use mathematical operations in quite the same way (or rather, you'll be hit with some nasty surprises). If you're not already very familiar with `numpy`, don't use object arrays. – Joe Kington Feb 01 '16 at 15:16
  • 1
    To complement Joe's comment and see why he is right when he says *It's not a "real" array in the same sense as the others* , see [this](https://stackoverflow.com/questions/45426587/what-is-going-on-behind-this-numpy-selection-behavior). – keepAlive Sep 04 '17 at 13:41
  • @JoeKington When you say order is not preserved. Does it mean in the list or in the numpy arrays? – Coddy Sep 15 '20 at 15:38
4

Small addition: if you'd like to use numpy.savez() and preserve names associated with the saved arrays (instead of arr_0, arr_1, ...) you can pass a dictionary as **kwargs using the double-star operator.

d = {}
d['a'] = np.random.randint(10, size=5)
d['b'] = np.random.randint(10, size=5)
print(d)
# {'a': array([8, 9, 5, 0, 0]), 'b': array([1, 7, 6, 9, 2])}

np.savez("test", **d)
container = np.load("test.npz")

e = {name: container[name] for name in container}
print(e)
# {'a': array([8, 9, 5, 0, 0]), 'b': array([1, 7, 6, 9, 2])}
L_W
  • 942
  • 11
  • 18