0

I'm using python in the lab to control measurements. I often find myself looping over a value (let's say voltage), measuring another (current) and repeating that measurement a couple of times to be able to average the results later. Since I want to keep all the measured data, I like to write it to disk immediately and to keep things organized I use the hdf5 file format. This file format is hierarchical, meaning it has some sort of directory structure inside that uses Unix style names (e.g. / is the root of the file). Groups are the equivalent of directories and datasets are more or less equivalent to files and contain the actual data. The code resulting from such an approach looks something like:

import h5py

hdf_file = h5py.File('data.h5', 'w')
for v in range(5):
    group  = hdf_file.create_group('/'+str(v))
    v_source.voltage = v
    for i in range(3):
        group2 = group.create_group(str(i))

        current = i_probe.current
        group2.create_dataset('current', data = current)
hdf_file.close()

I've written a small library to handle the communication with instruments in the lab and I want this library to automatically store the data to file, without explicitly instructing to do so in the script. The problem I run into when doing this is that the groups (or directories if you prefer) still need to be explicitly created at the start of the for loop. I want to get rid of all the file handling code in the script and therefore would like some way to automatically write to a new group on each iteration of the for loop. One way of achieving this would be to somehow modify the for statement itself, but I'm not sure how to do this. The for loop can of course be nested in more elaborate experiments.

Ideally I would be left with something along the lines of:

import h5py

hdf_file = h5py.File('data.h5', 'w')
for v_source.voltage in range(5): # v_source.voltage=x sets the voltage of a physical device to x
    for i in range(3):
        current = i_probe.current # i_probe.current reads the current from a physical device
        current_group.create_dataset('current', data = current)
hdf_file.close()

Any pointers to implement this solution or something equally readable would be very welcome.

Edit:

The code below includes all class definitions etc and might give a better idea of my intentions. I'm looking for a way to move all the file IO to a library (e.g. the Instrument class).

import h5py


class Instrument(object):
    def __init__(self, address):
        self.address = address

    @property
    def value(self):
        print('getting value from {}'.format(self.address))
        return 2 # dummy value instead of value read from instrument

    @value.setter
    def value(self, value):
        print('setting value of {} to {}'.format(self.address, value))


source1 = Instrument('source1')
source2 = Instrument('source2')
probe = Instrument('probe')

hdf_file = h5py.File('data.h5', 'w')
for v in range(5):
    source1.value = v
    group  = hdf_file.create_group('/'+str(v))
    group.attrs['address'] = source1.address
    for i in range(4):
        source2.value = i
        group2  = group.create_group(str(i))
        group2.attrs['address'] = source2.address

        group2.create_dataset('current', data = probe.value)
hdf_file.close()
Octaviour
  • 745
  • 6
  • 18
  • The code doesn't really make sense unless we know what v_source is since your first code snippet overwrites v_source.voltage with v every iteration. is v_source supposed to be different on each iteration of the loop? – Cameron Aavik Mar 24 '17 at 11:06
  • `v_source` and `i_probe` are objects that represent represent a physical instrument. `v_source.voltage` would set the voltage of the instrument, `i_probe.current` reads the current from the instrument. The instrument remains the same on each iteration (and thus `v_source`), but its output voltage changes (and thus `v_source.voltage`). (edited original post to reflect this more clearly) – Octaviour Mar 24 '17 at 13:16

1 Answers1

0

Without seeing the code it is hard to see, but essentially from the looks of it the pythonic way to do this is that every time you add a new dataset, you want to check whether the directory exists, and if it does you want to append the new dataset, and if it doesn't you want to create a new directory - i.e. this question might help

Writing to a new file if not exist, append to file if it do exist

Instead of writing a new file, use it to create a directory instead. Another helpful one might be

How to check if a directory exists and create it if necessary?

Community
  • 1
  • 1
A. N. Other
  • 392
  • 4
  • 14
  • But how would you know what the correct name of the directory is? This name depends on the number of `for` loops surrounding the `create_dataset` command and the value at that specific iteration. What code would you like to see additionally? – Octaviour Mar 24 '17 at 13:14
  • Interesting. From the looks of it the directory name is entirely dependant on the voltage v. Therefore you would need to pass this in your create dataset file in order for you to be able to do the above. Else I think it is impossible, as you are missing a piece of data. – A. N. Other Mar 24 '17 at 13:21
  • I was afraid so. Since the data is ultimately in the script I was hoping there was a way to do what I want. I wouldn't be surprised if there really was no way to accomplish this though. – Octaviour Mar 24 '17 at 15:20