1

I need to save several numpy arrays and Python objects to disk.I want to completely minimize the I/O. I don't mind if the loader or saver has to do any lifting in memory, but the I/O footprint (actual access) should be the lowest, since we have problems in our cluster when many jobs access the filesystem at the same time.

I tried with:

import numpy as np

my_data = dict()

my_data['r1'] = np.random.randint(3, size=(100,200))
my_data['rs'] = np.random.randint(3, size=(50,400))

my_data['annotation_info'] = 'Two random arrays'
my_data['current_date']    = 'July 28' 

np.savez('test.npz', my_data = my_data)

But when I load this, I get:

temp = np.load('test.npz')
my_data = temp['my_data']
my_data['r1']

ValueError: field named features not found

Also, my_data seems to now be an array, with my_data.shape returning (). Oddly enough, if I do:

print(my_data)

I get:

{'current_date': 'July 28', 'rs': array([[0, 1, 0, ..., 0, 2, 0],
       [1, 1, 1, ..., 1, 1, 0],
       [2, 1, 1, ..., 1, 1, 0],
       ..., 
       [1, 0, 2, ..., 2, 0, 1],
       [0, 2, 0, ..., 1, 1, 0],
       [1, 1, 0, ..., 1, 1, 1]]), 'annotation_info': 'Two random arrays', '
r1': array([[2, 0, 1, ..., 0, 2, 2],                                      
       [0, 0, 2, ..., 0, 2, 1],
       [2, 2, 2, ..., 1, 0, 0],
       ..., 
       [0, 2, 1, ..., 2, 0, 0],
       [0, 0, 1, ..., 2, 1, 0],
       [2, 1, 2, ..., 0, 2, 2]])}

Update

If I do what unutbu recommends:

np.savez('test.npz', **my_data)
my_data = np.load('test.npz')
my_variable = my_data['annotation_info']

my_variable is not a string, e.g. my_variable.upper() returns:

numpy.ndarray object has no attribute 'upper' 

In fact type(my_variable) returns:

numpy.ndarray

But again, my_variable[0] throws an error, and my_variable.shape returns () (which seems to be a 0-Rank array)

How can I access the actual object stored in the array?

Community
  • 1
  • 1
Amelio Vazquez-Reina
  • 91,494
  • 132
  • 359
  • 564

1 Answers1

1

Try

np.savez('test.npz', **my_data)
my_data = np.load('test.npz')
print(my_data['r1'])

NumPy saves the strings as NumPy arrays. To access the strings as Python objects, you could use the item method:

my_data = np.load('test.npz')
my_variable = my_data['annotation_info'].item()
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677