Python: how to store a numpy multidimensional array in PyTables?

Question

How can I put a numpy multidimensional array in a HDF5 file using PyTables?

From what I can tell I can't put an array field in a pytables table.

I also need to store some info about this array and be able to do mathematical computations on it.

Any suggestions?

Honestly, if you're storing a lot of just straight up ND arrays, you're better off with `h5py` instead of `pytables`. It's as simple as `f.create_dataset('name', data=x)` where `x` is your numpy array and `f` is the open hdf file. Doing the same thing in `pytables` is possible, but considerably more difficult. — Joe Kington, Jan 12 '12 at 22:16
I thought of that but pytables has some features (tables.expr) to do calculations directly on the arrays, can i have that with h5py ? — scripts, Jan 12 '12 at 22:22
@scripts - Not in the compressed, accelerated way that `pytables` does. (Or at least not that I know of, anyway.) `pytables` will also give you lots of nice querying abilities. `h5py` is better suited to straight-up storage and slicing of on-disk arrays (and is more pythonic, i.m.o., too). Not to plug my own answer too much, but my thoughts on the tradeoff between the two is here: http://stackoverflow.com/questions/7883646/exporting-from-importing-to-numpy-scipy-in-sqlite-and-hdf5-formats/7891137#7891137 — Joe Kington, Jan 12 '12 at 22:34
thanks for the info Joe Kington and for my case pytables is better suited because of the powerful querying techniques — scripts, Jan 12 '12 at 22:43

score 34 · Accepted Answer · edited Jul 03 '19 at 11:45

There may be a simpler way, but this is how you'd go about doing it, as far as I know:

import numpy as np
import tables

# Generate some data
x = np.random.random((100,100,100))

# Store "x" in a chunked array...
f = tables.open_file('test.hdf', 'w')
atom = tables.Atom.from_dtype(x.dtype)
ds = f.createCArray(f.root, 'somename', atom, x.shape)
ds[:] = x
f.close()

If you want to specify the compression to use, have a look at tables.Filters. E.g.

import numpy as np
import tables

# Generate some data
x = np.random.random((100,100,100))

# Store "x" in a chunked array with level 5 BLOSC compression...
f = tables.open_file('test.hdf', 'w')
atom = tables.Atom.from_dtype(x.dtype)
filters = tables.Filters(complib='blosc', complevel=5)
ds = f.createCArray(f.root, 'somename', atom, x.shape, filters=filters)
ds[:] = x
f.close()

There's probably a simpler way for a lot of this... I haven't used pytables for anything other than table-like data in a long while.

Note: with pytables 3.0, f.createCArray was renamed to f.create_carray. It can also accept the array directly, without specifying the atom,

f.create_carray('/', 'somename', obj=x, filters=filters)

Note that this can now be done much more straightforwardly using the create_array method on file objects, as described in the section 'Creating new array objects' at http://pytables.github.io/usersguide/tutorials.html — Ben Allison, Oct 02 '14 at 15:52
`AttributeError: 'File' object has no attribute 'createCArray'` — Nico Schlömer, Nov 19 '19 at 20:05

Python: how to store a numpy multidimensional array in PyTables?

1 Answers1

Linked