-1

I am trying to create a generator function to train my machine learning model and yield some images. Basically it goes like that (the minimum to reproduce the error on my side):

def generator(hdf5_path: str, primary_keys: list):
    while True:
        with h5py.File(hdf5_path, "r") as f:
            for pk in primary_keys:
                yield f[pk][0]

gen = generator('/home/coder/images.hdf5', pk=['AAA', 'BBB'])
for i in range(5):
    image = next(gen)
    print(image)

From this code I am supposed to get 5 images and then the generator should be shut down closing the h5py file. But every time the generator is stopped I get this error message:

Exception ignored in: <generator object generator at 0x7f41b112ab30>
Traceback (most recent call last):
  File "/home/coder/workspaces/data_processing.py", line 420, in generator
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/home/coder/.local/lib/python3.8/site-packages/h5py/_hl/files.py", line 461, in __exit__
  File "/home/coder/.local/lib/python3.8/site-packages/h5py/_hl/files.py", line 432, in close
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 267, in h5py.h5f.get_obj_ids
  File "h5py/h5i.pyx", line 37, in h5py.h5i.wrap_identifier
ImportError: sys.meta_path is None, Python is likely shutting down

From what I understand this comes because the generator doesn't close properly but I am not too sure what to do about it ?

I also read that this could be potentially catch using try/error from here on the GeneratorExit but it didn't work (I can catch it with Exception but I want to understand more). The other message I read about this error were using a selenium package which I am not using at all (from here).

Any idea ? Solution/explanation for this behavior ?

Thank you

Unic0
  • 341
  • 1
  • 3
  • 19
  • You generate a new generator in each "for" iteration. Move "gen = ..." before the for-loop. This may lead to another error but is a step in the right direction. – Michael Butscher Feb 20 '22 at 23:46
  • Sorry, my bad the code I am using is indeed looping only on the `next` and not creating a `generator` each loop iteration. I edited the message to rectify this mistake – Unic0 Feb 20 '22 at 23:56
  • The generator only receives two primary keys and can therefore only yield two images before failing with a different error. This doesn't make sense to me. – Michael Butscher Feb 21 '22 at 09:48
  • Agreed w/ Michael Butscher. You will only get 1 image from dataset `'AAA'` and 1 from `'BBB'`. Please share shape of the datasets so I can understand what `f[pk][0]` is reading. If you intend to read the entire dataset as an array, use `f[pk][()]` instead of `f[pk][0]`. – kcw78 Feb 21 '22 at 14:38

1 Answers1

0

It's hard to diagnose your problem without more details about your data. That said, I can provide a simple example of how your generator might work for a file with similar dataset names and schema.

Code below is how I think you want your generator to work. Note that I fixed an error in the call to match the generator (modified pk= to primary_keys=, and put the keys in a list to get the size used by range()). Also, you don't need while True:.

def generator(hdf5_path: str, primary_keys: list):
    with h5py.File(hdf5_path, "r") as f:
        for pk in primary_keys:
            yield f[pk][()]

pk_list = ['AAA', 'BBB', 'CCC']
print('\ngenerator output:')    
gen = generator(hdf5_path, primary_keys=pk_list)
for i in range(len(pk_list)):
    image = next(gen)
    print(image.shape)    

You don't really need to create a list of dataset names (aka keys) to read. HDF5 files are self-describing and you can get them with the .keys() method. Method below does the same thing without the primary_keys input parameter.

def generator2(hdf5_path: str):
    with h5py.File(hdf5_path, "r") as f:
        for pk in f.keys():
            yield f[pk][()]

print('\ngenerator 2 output:')    
# alternate method that does not require a list of keys as input 
gen2 = generator2(hdf5_path)
for image in gen2:
    print(image.shape)    

Finally, it is simple enough to do this without a generator (unless you really need one for another part of your project). See code below for that method:

# alternate method that does not use a generator   
print('\nalternate method output:')    
with h5py.File(hdf5_path, "r") as h5f: 
    for ds in h5f.keys():
        image = h5f[ds][()]
        print(image.shape)   

Finally, here is code to create the simple file I used to test code above:

import numpy as np
import h5py
with h5py.File(hdf5_path,'w') as h5f:
    arr = np.random.random(100).reshape(10,10)
    h5f.create_dataset('AAA',data=arr)
    arr = np.random.random(100).reshape(10,10)
    h5f.create_dataset('BBB',data=arr)
    arr = np.random.random(100).reshape(10,10)
    h5f.create_dataset('CCC',data=arr)
kcw78
  • 7,131
  • 3
  • 12
  • 44