4

I have several hdf5 files, each of them with the same structure. I'd like to create one pytable out of them by somehow merging the hdf5 files.

What I mean is that if an array in file1 has size x and array in file2 has size y, the resulting array in the pytable will be of size x+y, containing first all the entries from file1 and then all the entries from file2.

Salvador Dali
  • 214,103
  • 147
  • 703
  • 753
Asen Christov
  • 848
  • 6
  • 21

1 Answers1

6

How you want to do this depends slightly on the data type that you have. Arrays and CArrays have a static size so you need to preallocate the data space. Thus you would do something like the following:

import tables as tb
file1 = tb.open_file('/path/to/file1', 'r')
file2 = tb.open_file('/path/to/file2', 'r')
file3 = tb.open_file('/path/to/file3', 'r')
x = file1.root.x
y = file2.root.y

z = file3.create_array('/', 'z', atom=x.atom, shape=(x.nrows + y.nrows,))
z[:x.nrows] = x[:]
z[x.nrows:] = y[:]

However, EArrays and Tables are extendable. Thus you don't need to preallocate the size and can copy_node() and append() instead.

import tables as tb
file1 = tb.open_file('/path/to/file1', 'r')
file2 = tb.open_file('/path/to/file2', 'r')
file3 = tb.open_file('/path/to/file3', 'r')
x = file1.root.x
y = file2.root.y

z = file1.copy_node('/', name='x', newparent=file3.root, newname='z')
z.append(y)
Anthony Scopatz
  • 3,265
  • 2
  • 15
  • 14
  • This might be obvious but I am unclear about what the last two lines are doing. Is z supposed to be the combined output file? Are those two lines doing the same thing? Is it possible clarify variable naming and definitions here? – aarslan Oct 16 '14 at 13:39
  • The file being written to needs to be opened in append (or write) mode. So use `'a'` instead of `'r'` when opening `file3`. – 153957 May 10 '16 at 12:38
  • 1
    in order for the example to work, I also had to change the final row to: `z.append(y[:])` – ot226 Feb 11 '18 at 11:38