0

I want to write numpy arrays to a file and easily load them in again.

I would like to have a function save() that preferably works in the following way:

data = [a, b, c, d]
save('data.h5', data)

which then does the following

h5f = h5py.File('data.h5', 'w')
h5f.create_dataset('a', data=a)
h5f.create_dataset('b', data=b)
h5f.create_dataset('c', data=c)
h5f.create_dataset('d', data=d)
h5f.close()

Then subsequently I would like to easily load this data with for example

a, b, c, d = load('data.h5')

which does the following:

h5f = h5py.File('data.h5', 'r')
a = h5f['a'][:]
b = h5f['b'][:]
c = h5f['c'][:]
d = h5f['d'][:]
h5f.close()

I can think of the following for saving the data:

h5f = h5py.File('data.h5', 'w')
data_str = ['a', 'b', 'c', 'd']
for name in data_str:
    h5f.create_dataset(name, data=eval(name))
h5f.close()

I can't think of a similar way of using data_str to then load the data again.

johnbaltis
  • 1,413
  • 4
  • 14
  • 26
  • 1
    Have you looked at `numpy.savez`? It's not hdf5, but it will accomplish your task rather easily. – farenorth May 01 '15 at 16:11
  • Is this helpful: http://stackoverflow.com/questions/4357851/creating-or-assigning-variables-from-a-dictionary-in-python ? – farenorth May 01 '15 at 17:13

2 Answers2

3

Rereading the question (was this edited or not?), I see load is supposed to function as:

a, b, c, d = load('data.h5')

This eliminates the global variable names issue that I worried about earlier. Just return the 4 arrays (as a tuple), and the calling expression takes care of assigning names. Of course this way, the global variable names do not have to match the names in the file, nor the names used inside the function.

def load(filename):
    h5f = h5py.File(filename, 'r')
    a = h5f['a'][:]
    b = h5f['b'][:]
    c = h5f['c'][:]
    d = h5f['d'][:]
    h5f.close()
    return a,b,c,d

Or using a data_str parameter:

def load(filename, data_str=['a','b','c','d']):
    h5f = h5py.File(filename, 'r')
    arrays = []
    for name in data_str:
        var = h5f[name][:]
        arrays.append(var)
    h5f.close()
    return arrays

For loading all the variables in the file, see Reading ALL variables in a .mat file with python h5py


An earlier answer that assumed you wanted to take the variable names from the file key names.

This isn't a h5py issue. It's about creating global (or local) variables using names from a dictionary (or other structure). In other words, how creat a variable, using a string as name.

This issue has come up often in connection with argparse, an commandline parser. It gives an object like args=namespace(a=1, b='value'). It is easy to turn that into a dictionary (with vars(args)), {'a':1, 'b':'value'}. But you have to do something tricky, and not Pythonic, to create a and b variables.

It's even worse if you create that dictionary inside a function, and then want to create global variables (i.e. outside the function).

The trick involves assigning to locals() or globals(). But since it's un-pythonic I'm reluctant to be more specific.

In so many words I'm saying the same thing as the accepted answer in https://stackoverflow.com/a/4467517/901925


For loading variables from a file into an Ipython environment, see

https://stackoverflow.com/a/28258184/901925 ipython-loading-variables-to-workspace

Community
  • 1
  • 1
hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • I was looking for a function that would load an arbitrary number of variables, but in your second part you explain that this won't be very Pythonic. Thanks for all the information! – johnbaltis May 06 '15 at 11:19
0

I would use deepdish (deepdish.io):

import deepdish as dd
dd.io.save(filename, {'dict1': dict1, 'obj2': obj2}, compression=('blosc', 9))
wordsforthewise
  • 13,746
  • 5
  • 87
  • 117