1

I'm wondering if it's possible to do the following with hdf5/cxi file:

  • have one external h5 file that stores numpy array with 4D dimension
  • have another cxi file in which it is highle required to add group which will contain an array of external links to h5 file

Expect in cxi to get somethink like that:

Group1/subgroup  {num, ExternalLink to h5 file}

where num is a length of required array of links.

I try to do:

import h5py as h5
import numpy as np

h5_file = sys.argv[1]
h5path = sys.argv[2]
cxi_file = sys.argv[3]
cxi_path = sys.argv[4]
num = sys.argv[5]

link = h5.ExternalLink(h5_file, h5path)
l = np.array([link] * num)

with h5.File(cxi_file, 'a') as f:
    dset = f.create_dataset(cxi_path, (num,))
    for i in range(num):
        dset[i] = l[i]

But it didn't work. I also tried dset = f.create_dataset(path_to_new_mask,data=l) and made list of this file with length = num, but all this steps failed.

I'll be very greatful if someone can help.

kitsune_breeze
  • 97
  • 1
  • 11

2 Answers2

1

kitsune_breeze, I reviewed the Q&A and the comments. There are several areas that need to be clarified. Let's start with external links versus object or region references. As I understand you want to create a dataset (aka an array) of external links (with each link referencing a different HDF5 file).

The answer from Mahsa Hassankashi on 19-April describes how to create a dataset of dtype=h5py.ref_dtype or dtype=h5py.regionref_dtype. The first is an object reference, and the second is a region reference. They are not the same as external links! Also, the example code requires h5py 2.10.0 and you are using h5py 2.9.0.. (FYI, there is a solution to this in 2.9.0 if you choose to use object or region references.)

Here's the bad news: based on my tests, you can't create a dataset (or np array) of HDF5 external links. Here are the steps to see why:

In [1]: import h5py
In [2]: h5fw = h5py.File('SO_61290760.h5',mode='w')
# create an external link object
In [3]: link_obj = h5py.ExternalLink('file1.h5','/')
In [4]: type(link_obj)
Out[4]: h5py._hl.group.ExternalLink
In [5]: link_dtype = type(link_obj)
In [6]: h5fw.create_dataset("MyRefs", (10,), dtype=link_dtype)
Traceback (most recent call last):
...
TypeError: Object dtype dtype('O') has no native HDF5 equivalent

Reading the h5py documentation, it appears object and region references are also dtype('O') datatypes, and required additional metadata to implement them. There is no mention that this was done for External Links. As a result, I don't think you can create an array of External Links (because there isn't a dtype to support them).

That said, you can still create External Links from 1 HDF5 file to multiple HDF5 files. I have a simple example here (look under Method 1: Create External Links).
How can I combine multiple .h5 file?

If you decide to use Object or Region References, you need to use a different dtype specification in h5py 2.9.0.
Object Reference:
2.10.0 use: h5py.ref_dtype
2.9.0 use: h5py.special_dtype(ref=h5py.Reference)
Region Reference:
2.10.0 use: h5py.regionref_dtype
2.9.0 use: h5py.special_dtype(ref=h5py.RegionReference)

Code below demonstrates the behavor in 2.9.0:

In [9]: type(h5py.ref_dtype)
Traceback (most recent call last):
...
AttributeError: module 'h5py' has no attribute 'ref_dtype'

In [10]: type(h5py.special_dtype(ref=h5py.Reference))
Out[10]: numpy.dtype

In [11]: type(h5py.regionref_dtype)
Traceback (most recent call last):
...   
AttributeError: module 'h5py' has no attribute 'regionref_dtype'

In [12]: type(h5py.special_dtype(ref=h5py.RegionReference))
Out[12]: numpy.dtype

In [13]: dset = h5fw.create_dataset("MyRefs", (10,), dtype=h5py.special_dtype(ref=h5py.Reference))

In [14]: dset.dtype
Out[14]: dtype('O')
kcw78
  • 7,131
  • 3
  • 12
  • 44
  • ok, thank you for such detailed explanation. I have also faced that errors, so it was the reason why I asked here. At this moment I try to modernize my approach and use VDS (virtual dataset), and/or instead of using external link to use inner one (soft link). It seems that it should be a way how to organize an array of links that can be recorded into initial hdf5 file. – kitsune_breeze Apr 21 '20 at 06:59
0

Try it

myfile = h5py.File('foo.hdf5','w')

myfile['ext link'] = h5py.ExternalLink("otherfile.hdf5", "/path/to/resource")

dset = f.create_dataset("MyRefs", (100,), dtype=h5py.ref_dtype)

Or:

dset = f.create_dataset("ref", (2,), dtype=h5py.regionref_dtype)
  1. http://docs.h5py.org/en/stable/refs.html#storing-references-in-a-dataset
  2. http://docs.h5py.org/en/latest/high/group.html#external-links
Mahsa Hassankashi
  • 2,086
  • 1
  • 15
  • 25
  • This error appear: ```AttributeError: module 'h5py' has no attribute 'ref_dtype'``` – kitsune_breeze Apr 18 '20 at 15:59
  • @kitsune_breeze You have alias name, please replace h5py by h5. otherwise if error remains, please updated h5py to 2.10.0 with "conda install -c conda-forge h5py"(Anaconda channel is still on 2.9.0). – Mahsa Hassankashi Apr 18 '20 at 16:13
  • I did it. I no permission to update the version. but overcame the error by ```dt = h5.special_dtype(ref=h5.Reference)\n dset = f.create_dataset(path, (num,), dtype=dt)``` nevertheless, another problem ```TypeError: Can't convert incompatible object to HDF5 object reference```. I created link before ``` link = h5.ExternalLink(fnew_m, path1) l = [link]*num``` ``` with h5.File(file_cxi, 'a') as f: dt = h5.special_dtype(ref=h5.Reference) dset = f.create_dataset(path1, (num,), dtype=dt) for i in range(num): dset[i] = l[i] ``` – kitsune_breeze Apr 18 '20 at 16:25
  • @kitsune_breeze or use it: dset = f.create_dataset("ref", (2,), dtype=h5py.regionref_dtype) – Mahsa Hassankashi Apr 18 '20 at 16:26
  • If you do not have permission use Sudo and if you are on windows log in via administrator. – Mahsa Hassankashi Apr 18 '20 at 16:35
  • nope, I work on cluster, so I have no permission. I tried sudo. – kitsune_breeze Apr 18 '20 at 16:38
  • Did you use it? dtype=h5py.regionref_dtype – Mahsa Hassankashi Apr 18 '20 at 16:40
  • Sorry put this line before create_dataset "str_type = h5.new_vlen(str)" – Mahsa Hassankashi Apr 18 '20 at 16:53
  • he recorded it but didn't mention that it is an external link – kitsune_breeze Apr 18 '20 at 17:00
  • You meant that you could save it on the database but not as an external link? – Mahsa Hassankashi Apr 18 '20 at 17:02
  • It is saved as something but no as an external link yep, because h5ls shows that it is just dataset with size, without referring its own type. If you try to do it with one link, it will be shown Dataset {}, I want to have Dataset {100, ExternalLink} – kitsune_breeze Apr 18 '20 at 17:04
  • Did you use: myfile = h5py.File('foo.hdf5','w') myfile['ext link'] = h5py.ExternalLink("otherfile.hdf5", "/path/to/resource") – Mahsa Hassankashi Apr 18 '20 at 17:39
  • I used this: ` link = h5.ExternalLink(f1, path1) l = [link]*num with h5.File(file_cxi, 'a') as f: str_type = h5.special_dtype(vlen=str) dset = f.create_dataset(path2, (num,), dtype=str_type) for i in range(num): dset[i] = link ` – kitsune_breeze Apr 18 '20 at 17:43
  • @kitsune_breeze, are you still trying to solve your problem? – kcw78 Apr 19 '20 at 19:39
  • @kcw78, sure, I am trying – kitsune_breeze Apr 20 '20 at 07:17
  • @kcw78, at this moment I try to deal with VDS magic, but the question still opens. I have no idea how to create multiple links to the one data, vds can overcome with it but it opens slowly – kitsune_breeze Apr 20 '20 at 09:38