0

I've a project that is utilizing HDF5. There are file structures as well as HDF5 data structures for each dataset.

Think of a large video. Each frame is divided up equally and written to multiple files as well as multiple HDF5 data chunks. A single 'video' may have 20+ files (representing temporal and slices), and then more files to represent additional slices. The datasets aren't very large- under 30gb- but are still cumbersome.

My initial dive to associate (stitch) the pieces back together was to put together an array of pointers to the individual frames, and then stack them for the temporal aspect of the video. This would be (fairly) small since I would be pointing to the locations on disk where everything was. This would also limit the amount of data I'd have to hold into memory- always a bonus- for when I scale to the 'larger' datasets.

However the way to accomplish this in Python eludes me- especially when considering I want to tie in the metadata for each frame (pixels, their locations, etc).


Is there a method I should be following to better reference the data and 'stitch' it back together? My current method was to create numpy arrays of the raw data. This has the detriment of reading all of the data in and storing it in memory (and disk).

J.Hirsch
  • 129
  • 7
  • 1
    Have you considered creating an "Object or Region Reference"? Object references point to an object in the file (a dataset in your case). Region references always point to a dataset, and may be limited to a selection (aka a slice). Details here:[h5py Object/Region Refs](http://docs.h5py.org/en/stable/refs.html#refs) – kcw78 Dec 11 '19 at 21:52
  • No, I hadn't even known that was a thing. Does that work across files too ? (Data is embedded in different files as well). Still learning a lot about HDF. – J.Hirsch Dec 12 '19 at 14:13
  • 1
    Yes, HDF5 has lots beyond simple tables and arrays. :-) If you have data in different files, you should also investigate external links. With external links, you can have a single file (w/ the links) and use the links to access data in the linked files . I posted an answer that explains how to create external links for both h5py and pytables. See this Q&A [SO 58187004](https://stackoverflow.com/q/58187004/10462884) – kcw78 Dec 12 '19 at 21:51

0 Answers0