1

Python 3.7, Numpy: I need to save a 3rd-order object, which was created using numpy. It is a list of arrays, to be precise. The arrays get matrix-multiplied to vectors using numpy.dot() after being loaded. Is there a way to save this object (for example in a .txt-file) without it losing its format?

If I simply put the object into a .txt-file using .write() I convert it into a string. I could of course convert that back into the float array, but before I do that I wanted to know if there is a simpler or more efficient way of doing it.

That would look something like this:

    BigObject = []
    for i in (0, Size1):
        BigObject.append(np.random.uniform(-1, 1, (Size2, Size3)))

    with open("test.txt", "w+") as output:
        output.write(str(BigObject))

How I save it and

    with open("test.txt", "r") as input:
        NewBigObject = input.read()

how I read it.

This does give me back a string for the NewBigObject which I cannot matrix-multiply to a vector.

The way the BigArray gets saved is not relevant. I just want to know if there is a smart way of saving it without losing the format. Now I could run a series of split() and float() commands to get the original object back. But can I do this faster or more elegantly?

kmario23
  • 57,311
  • 13
  • 161
  • 150
Jason D.
  • 47
  • 7

1 Answers1

1

Here is a way to save the arrays as a dict but not as list (because saving it as list concatenates all the arrays into a single one, which we don't want) and then load it back for reading without losing the array format.

# sample array to work with
In [76]: arr = np.arange(12).reshape(4, 3)

# make a dict of say 4 copies of the array
In [77]: dict_of_arrs = {idx: arr for idx in range(4)}

# serialize it to disk; will be saved as `serialized_arrays.npy`
In [78]: np.save('serialized_arrays', dict_of_arrs)

# load it back for reading/processing
In [79]: loaded_arrs = np.load('serialized_arrays.npy')

# flatten it out and just take the 0th element in the list.
In [80]: loaded_arrs.ravel()[0]
Out[80]: 
{0: array([[ 0,  1,  2],
        [ 3,  4,  5],
        [ 6,  7,  8],
        [ 9, 10, 11]]), 1: array([[ 0,  1,  2],
        [ 3,  4,  5],
        [ 6,  7,  8],
        [ 9, 10, 11]]), 2: array([[ 0,  1,  2],
        [ 3,  4,  5],
        [ 6,  7,  8],
        [ 9, 10, 11]]), 3: array([[ 0,  1,  2],
        [ 3,  4,  5],
        [ 6,  7,  8],
        [ 9, 10, 11]])}

The above will return a dict; You can then iterate over this dict and access the arrays. If you prefer, you can give some sensible keys when making the dict dict_of_arrs.

kmario23
  • 57,311
  • 13
  • 161
  • 150
  • 1
    And the `serialized_arrays.npy` are a file within the folder of the program, perfect. Thank you very much, that helped a lot! – Jason D. Apr 23 '19 at 13:33
  • 1
    Update: So I've come back to this now, and it seems like ```loaded_arrs = np.load('serialized_arrays.npy')``` makes some trouble. I fixed it with ```loaded_arrs = np.load('serialized_arrays.npy', allow_pickle=True)```. Now I hope I don't ruin anything with that. – Jason D. Jun 16 '19 at 12:44