1

For example we have matrix(for example we want to store numpy array) and we store it in HDF5 file, but then we want to extend matrix by appending some rows to the end of original matrix(take in account that original matrix can be very big ~tens of Gb and it can't be load into RAM)

Also we want to have ability to read few rows from matrix from any point (maybe it called slice(?)) without loading whole matrix in RAM.

Can anyone provide an example how it can be done in python?

UPDATE:

I think another option is numpy.memmap , but it seems that there is no append.

This seems also an option but it operates with raw binary data, but I want to have access to matrix.Also I don't know how to do append in this case.

Community
  • 1
  • 1
mrgloom
  • 20,061
  • 36
  • 171
  • 301

1 Answers1

0

If you're going to be working with HDF5 files, then may I suggest using one of the libraries available, such as Pytables. I am posting and simplifying from their tutorial here: http://pytables.github.io/usersguide/tutorials.html

from tables import *

# Define a user record to characterize some kind of particles
class Particle(IsDescription):
    name      = StringCol(16)   # 16-character String
    idnumber  = Int64Col()      # Signed 64-bit integer
    ADCcount  = UInt16Col()     # Unsigned short integer
    TDCcount  = UInt8Col()      # unsigned byte
    grid_i    = Int32Col()      # integer
    grid_j    = Int32Col()      # integer
    pressure  = Float32Col()    # float  (single-precision)
    energy    = FloatCol()      # double (double-precision)

filename = "test.h5"
# Open a file in "w"rite mode
h5file = openFile(filename, mode = "w", title = "Test file")
# Create a new group under "/" (root)
group = h5file.createGroup("/", 'detector', 'Detector information')
# Create one table on it
table = h5file.createTable(group, 'readout', Particle, "Readout example")
# Fill the table with 10 particles
particle = table.row
for i in xrange(10):
    particle['name']  = 'Particle: %6d' % (i)
    particle['TDCcount'] = i % 256
    particle['ADCcount'] = (i * 256) % (1 << 16)
    particle['grid_i'] = i
    particle['grid_j'] = 10 - i
    particle['pressure'] = float(i*i)
    particle['energy'] = float(particle['pressure'] ** 4)
    particle['idnumber'] = i * (2 ** 34)
    # Insert a new particle record
    particle.append()
# Close (and flush) the file
h5file.close()

#now we will append some data to the table, after taking some slices 
f=tables.openFile(filename, mode="a")
f.root.detector
f.root.detector.readout
f.root.detector.readout[1::3]
f.root.detector.readout.attrs.TITLE
ro = f.root.detector.readout

#generators work
[row['energy'] for row in ro.where('pressure > 10')]


#append some data
table = f.root.detector.readout
particle = table.row
for i in xrange(10, 15):
  particle['name']  = 'Particle: %6d' % (i)
  particle['TDCcount'] = i % 256
  particle['ADCcount'] = (i * 256) % (1 << 16)
  particle['grid_i'] = i
  particle['grid_j'] = 10 - i
  particle['pressure'] = float(i*i)
  particle['energy'] = float(particle['pressure'] ** 4)
  particle['idnumber'] = i * (2 ** 34)
  particle.append()
table.flush()
f.close()
Paul
  • 7,155
  • 8
  • 41
  • 40