2

I am using h5py package in Py 2.7 and I have a data.h5 file containing 2 datasets: A with size n by 2 and B with size n by 3. How can I concatenate them to get one single data set C with size n by (2+3) columns. The datasets are huge and I can't load them in the memory.

Nima
  • 55
  • 8
  • 1
    Make a chunked (and extensible?) dataset on another file (with 5 columns). Load the 2 datasets by chunks into memory, concatenate them, and write the chunks to the new set. By specifying the write columns you could skip the concatenate. In sum, write the data to a new dataset in chunks. – hpaulj Sep 19 '17 at 01:33
  • @hpaulj can you please show me an example how to save the combined chunks into one single file. – Nima Sep 19 '17 at 01:45
  • 1
    https://stackoverflow.com/questions/43929420/how-to-concatenate-two-numpy-arrays-in-hdf5-format – hpaulj Sep 19 '17 at 05:02
  • @hpaulj thanks a lot! – Nima Sep 19 '17 at 15:15
  • Does this answer your question? [How to concatenate two numpy arrays in hdf5 format?](https://stackoverflow.com/questions/43929420/how-to-concatenate-two-numpy-arrays-in-hdf5-format) – demongolem Mar 20 '20 at 11:20
  • @demongolem I have the same question and this is not the answer. I really like that hdf5 files are not actually loaded in memory and are read on demand. I want to concatenate array without actually reading them, so I can pass it to my function that will just step through and do its' job. I have too much data to be loaded in memory at once and it is saved as separate files. – Valentyn Jul 14 '20 at 14:02

0 Answers0