Answer 3
Use the copy
method of the group
class from h5py
.
TL;DR
This works on groups and datasets.
Is recursive (can do deep and shallow copies).
Has options for attributes, symbolic links and references.
with h5py.File('destFile.h5','w') as f_dest:
with h5py.File('srcFile.h5','r') as f_src:
f_src.copy(f_src["/path/to/DataSet"],f_dest["/another/path"],"DataSet")
(The file object is also the root group.)
Locations in HDF5
"An HDF5 file is organized as a rooted, directed graph" (source).
HDF5 groups (including the root group) and data sets are related to each other as "locations" (in the C API most functions take a loc_id
which identifes a group or data set). These locations are the nodes on the graph, paths describe arcs through the graph to a node. copy
takes a source and destination location, not specifically a group or dataset, so it can be applied to both. The source and destination do not need to be in the same file.

Attributes
Attributes are stored within the header of the group or data set they are associated with. Therefore the attributes are also associated with that "location". It follows that copying a group or dataset will include all attributes associated with that "location". However you can turn this off.
References
copy
offers settings for references, also called object pointers. Object pointers are a data type in hdf5: H5T_STD_REG_OBJ
, similar to an integer H5T_STD_I32BE
(source) and can be stored in attributes or data sets. References can point to whole objects or regions within a data set. copy
only seems to cover object references. Does it break with data set regions H5T_STD_REF_DSETREG
?

Symbolic links
The "locations" taken by the C api are one level of abstraction which explains why the copy
function works on individual datasets. Look at the figure again, it is the edges which are labelled, not the nodes. Under the hood, HDF5 objects are the targets of links, each link (edge) has a name, the objects (nodes) do not have names. There are two types of links: hard links and symbolic links. All HDF5 objects must have at least one hard link, hard links can only target objects within their file. When hard links are created the reference count increases by one, symbolic links do not effect the reference count. Symbolic links may point to objects within the file (soft) or objects in other files (external). copy
offers options to expand soft and external symbolic links.
This explains the error code (below) and offers an alternative to copying your dataset; A soft link could allow access to a data set in another file.
RuntimeError: Unable to create link (interfile hard links are not allowed)