Cannot retrieve Datasets in PyTables using natural naming

Question

I'm new in PyTables and I want to retrieve a dataset from a HDF5 using natural naming but I'm getting this error using this input:

f = tables.open_file("filename.h5", "r")

f.root.group-1.dataset-1.read()

group / does not have a child named group

and if I try:

f.root.group\-1.dataset\-1.read()

group / does not have a child named group

unexpected character after line continuation character

I can't change names in the groups because is big data from an experiment.

I suspect the problem is with your group names (and NOT with your code). Have you opened the file with **HDFVIEW ** and confirmed the group names? Do you have a group named `group-1` at the root level? If so, I think you have a problem. Natural Naming wants names that follow this pattern: [a-zA-Z_][a-zA-Z0-9_] -- so `group_1` is OK, `group-1`is not. — kcw78, Mar 18 '19 at 17:19
yes, the group are named like this, in fact they exist, with h5py I can borrow datasets without problem because `f["/group-1/dataset-1"][:]` show me the values. And ViTables let me view it. And i can't waste resources to change the names from - to _ I will try to change it and see how much time it takes to the server to make a change into a file. — Andr Fabin Castellanos Aldama, Mar 19 '19 at 01:51

score 1 · Accepted Answer · answered Mar 19 '19 at 20:44

You can't use the minus (hyphen) sign with Natural Naming because it's not a valid character as a Python variable name (group-1 and dataset-1 look like a subtraction operation!) See this discussion:
why-python-does-not-allow-hyphens

If you have groups and datasets that use this naming convention, you will have to use the file.get_node() method to access them. Here's a simple code snippet to demonstrate. The first part creates 2 groups and tables (datasets). #1 uses _ and #2 uses - in the group and table names. The second part accesses dataset #1 with Natural Naming, and dataset #2 with file.get_node()

import tables as tb
import numpy as np

# Create h5 file with 2 groups and datasets:
# '/group_1', 'ds_1' : Natural Naming Supported
# '/group-2', 'ds-2' : Natural Naming NOT Supported
h5f = tb.open_file('SO_55211646.h5', 'w')

h5f.create_group('/', 'group_1')
h5f.create_group('/', 'group-2')

mydtype = np.dtype([('a',float),('b',float),('c',float)])
h5f.create_table('/group_1', 'ds_1', description=mydtype )
h5f.create_table('/group-2', 'ds-2', description=mydtype )

# Close, then Reopen file READ ONLY
h5f.close()

h5f = tb.open_file('SO_55211646.h5', 'r')

testds_1 = h5f.root.group_1.ds_1.read()
print (testds_1.dtype)

# these aren't valid Python statements:
#testds-2 = h5f.root.group-2.ds-2.read()
#print (testds-2.dtype)

testds_2 = h5f.get_node('/group-2','ds-2').read()
print (testds_2.dtype)

h5f.close()

Glad my example helped you with Natural Naming and Python variables. Natural Naming works when you know the group and table/dataset names in advance (and can code them in). I initially used natural naming, but over time I found `h5file.get_node()` to be more useful as I generalized my procedures to walk nodes and work with the returned group and dataset names. Also, upvotes are nice too. :-) — kcw78, Mar 22 '19 at 13:50

Cannot retrieve Datasets in PyTables using natural naming

1 Answers1

Linked