0

When I use either h5py visit or visititems, on the file h5ex_g_visit.h5 (downloaded from http://mirror.fcaglp.unlp.edu.ar/pub/ftp.hdfgroup.org/HDF5/examples/examples-by-api/files/exbyapi/h5ex_g_visit.h5 ) neither work as expected.

A simple program: import h5py

def print_objs (name):
   print (name)

fd = h5py.File('h5ex_g_visit.h5')
fd.visit(print_objs)

It prints

group1
group1/dset1
group1/group3
group1/group3/group4
group1/group3/group4/group1
group1/group3/group4/group2

I think it should print

group1
group1/dset1
group1/group3
group1/group3/dset2
group1/group3/group4
group1/group3/group4/group1
group1/group3/group4/group1/group5
group1/group3/group4/group2
group2
group2/dset2
group2/group4
group2/group4/group1
group2/group4/group1/group5
group2/group4/group1/group5/dset1
group2/group4/group1/group5/group3
group2/group4/group1/group2

You get the same missing objects when using visititems.

I looks like it finds the first group in a level and follows that path, without ever returning to pick up other groups and datasets in that level. It also doesn't seem to go below 4 levels.

This works correctly for the similar function in C.

Are these python methods, the HDF5 file, or me broken?

Thanks

Chris_Rands
  • 38,994
  • 14
  • 83
  • 119
dcellis
  • 21
  • 2
  • Please check the URL to your HDF5 file. It gives me a "404 not found" error when I tried it. Regarding `visititems()` behavior, look at this Answer. It shows how `visititems()` descends multiple groups: [a way to get datasets in all groups](https://stackoverflow.com/a/63319414/10462884) – kcw78 Aug 11 '20 at 20:40

1 Answers1

0

I found your file here: h5ex_g_visit.h5 You're right, there's something strange about it. I get the same output as you did when I run my visititems() callable on it. I also tried Pytables's .iter_nodes() method. It's a different way to iterate over all nodes. That was even worse; it gets stuck in a loop.

UPDATE: I should have inspected your file with h5dump before answering. The file has hardlinks. See output from h5dump below:

C:\ > h5dump h5ex_g_visit.h5
HDF5 "h5ex_g_visit.h5" {
GROUP "/" {
   GROUP "group1" {
      DATASET "dset1" {
         DATATYPE  H5T_STD_I32LE
         DATASPACE  SIMPLE { ( 1, 1 ) / ( 1, 1 ) }
         DATA {
         (0,0): 0
         }
      }
      GROUP "group3" {
         DATASET "dset2" {
            HARDLINK "/group1/dset1"
         }
         GROUP "group4" {
            GROUP "group1" {
               GROUP "group5" {
                  HARDLINK "/group1"
               }
            }
            GROUP "group2" {
            }
         }
      }
   }
   GROUP "group2" {
      HARDLINK "/group1/group3"
   }
}

So, the "missing objects" are duplicates due to the hardlinks. Looks like h5py is smart enough to figure that out and not repeat them. (And, PyTables seems to be confused by the links. Ugh.)

This code shows how it works if you DON'T have hardlinks. I created a HDF5 file that mimics your data schema. When I run my visititems() callable on it, it outputs all objects and object names in the tree. See code below.

My conclusion: the "problem" s due to the hardlinks, not with visit() or visititems().

def visit_func(name, node) :
    print (node.name)

arr = np.arange(100).reshape(10,10)

with h5py.File('SO_63364951.h5', 'w') as h5w:
     h5w.create_group('group1') 
     h5w['group1'].create_dataset('dset1',data=arr)
     h5w['group1'].create_group('group3')
     h5w['group1/group3'].create_dataset('dset2',data=arr)
     h5w['group1/group3'].create_group('group4')
     h5w['group1/group3/group4'].create_group('group1')
     h5w['group1/group3/group4/group1'].create_group('group5')
     h5w['group1/group3/group4'].create_group('group2')
             
     h5w.create_group('group2') 
     h5w['group2'].create_dataset('dset2',data=arr)
     h5w['group2'].create_group('group4')
     h5w['group2/group4'].create_group('group1')
     h5w['group2/group4/group1'].create_group('group5')
     h5w['group2/group4/group1/group5'].create_dataset('dset1',data=arr)
     h5w['group2/group4/group1/group5'].create_group('group3')
     h5w['group2/group4/group1'].create_group('group2')
      
with h5py.File('SO_63364951.h5', 'r') as h5r:     
    h5r.visititems(visit_func)
kcw78
  • 7,131
  • 3
  • 12
  • 44