1

Suppose that I have multiple hdf5 files in each different directory, which corresponds to different date.

Now I can use the following code to read each file and then extract something useful.

However, it seems that the for loop to read each file using h5py.File is not so fast.

Are there some methods to read multiple h5 files of a day or of all the days together?

for days in np.arange(len(date_needed)):
    year = date_needed[days].year
    dayofyear = date_needed[days].dayofyear
    month = date_needed[days].month
    day = date_needed[days].day
    files = glob(path+'{}/{:0>3d}/data_{}{:0>2d}{:0>2d}_test.h5'.format(year,dayofyear,year%100,month,day))
    if len(files)>=1:
        for file_index in np.arange(len(files)):
            data = h5py.File(files[file_index],'r')
            ...
            ...
Allen Zhang
  • 111
  • 1
  • 12
  • Speeding this process up might not be possible. It appears that the LFZ decompression algorithm is doing its best. Some tips shared here are split up the data into smaller chunks or use un-compressed data https://stackoverflow.com/a/55299043/. – Eric Leung Oct 18 '19 at 06:02
  • If I understand, you are opening 1 HDF5 file at a time (based on file name match to year, dayofyear, month, day). Opening a file is almost instantaneous. What do you do after you open it? I suspect that is where most of the time is spent. Run timeit() on the processes to find the bottlenecks. How big is each file? – kcw78 Oct 18 '19 at 13:36

0 Answers0