18

How can I put a numpy multidimensional array in a HDF5 file using PyTables?

From what I can tell I can't put an array field in a pytables table.

I also need to store some info about this array and be able to do mathematical computations on it.

Any suggestions?

Michael Currie
  • 13,721
  • 9
  • 42
  • 58
scripts
  • 1,452
  • 1
  • 19
  • 24
  • 8
    Honestly, if you're storing a lot of just straight up ND arrays, you're better off with `h5py` instead of `pytables`. It's as simple as `f.create_dataset('name', data=x)` where `x` is your numpy array and `f` is the open hdf file. Doing the same thing in `pytables` is possible, but considerably more difficult. – Joe Kington Jan 12 '12 at 22:16
  • Joe, +1. I was about to post an almost identical comment. – Sven Marnach Jan 12 '12 at 22:21
  • I thought of that but pytables has some features (tables.expr) to do calculations directly on the arrays, can i have that with h5py ? – scripts Jan 12 '12 at 22:22
  • 4
    @scripts - Not in the compressed, accelerated way that `pytables` does. (Or at least not that I know of, anyway.) `pytables` will also give you lots of nice querying abilities. `h5py` is better suited to straight-up storage and slicing of on-disk arrays (and is more pythonic, i.m.o., too). Not to plug my own answer too much, but my thoughts on the tradeoff between the two is here: http://stackoverflow.com/questions/7883646/exporting-from-importing-to-numpy-scipy-in-sqlite-and-hdf5-formats/7891137#7891137 – Joe Kington Jan 12 '12 at 22:34
  • thanks for the info Joe Kington and for my case pytables is better suited because of the powerful querying techniques – scripts Jan 12 '12 at 22:43

1 Answers1

34

There may be a simpler way, but this is how you'd go about doing it, as far as I know:

import numpy as np
import tables

# Generate some data
x = np.random.random((100,100,100))

# Store "x" in a chunked array...
f = tables.open_file('test.hdf', 'w')
atom = tables.Atom.from_dtype(x.dtype)
ds = f.createCArray(f.root, 'somename', atom, x.shape)
ds[:] = x
f.close()

If you want to specify the compression to use, have a look at tables.Filters. E.g.

import numpy as np
import tables

# Generate some data
x = np.random.random((100,100,100))

# Store "x" in a chunked array with level 5 BLOSC compression...
f = tables.open_file('test.hdf', 'w')
atom = tables.Atom.from_dtype(x.dtype)
filters = tables.Filters(complib='blosc', complevel=5)
ds = f.createCArray(f.root, 'somename', atom, x.shape, filters=filters)
ds[:] = x
f.close()

There's probably a simpler way for a lot of this... I haven't used pytables for anything other than table-like data in a long while.

Note: with pytables 3.0, f.createCArray was renamed to f.create_carray. It can also accept the array directly, without specifying the atom,

f.create_carray('/', 'somename', obj=x, filters=filters)
Suyog Jadhav
  • 325
  • 2
  • 6
Joe Kington
  • 275,208
  • 71
  • 604
  • 463
  • 6
    Note that this can now be done much more straightforwardly using the create_array method on file objects, as described in the section 'Creating new array objects' at http://pytables.github.io/usersguide/tutorials.html – Ben Allison Oct 02 '14 at 15:52
  • `AttributeError: 'File' object has no attribute 'createCArray'` – Nico Schlömer Nov 19 '19 at 20:05