Changing HDF5 compression filter via h5py

Question

How to read a dataset that is compressed via lzf compression filter and change it to native HDF5 third party filters like szip or zlib? Would simply reading it as shown in How to read HDF5 files in Python, and writing it with compression specified when writing a dataset work?

You can't _change_ the compression of a dataset once it has been created, but you could certainly use the approach you linked to create a _new_ dataset, either in the same file or a different one. — bnaecker, Oct 29 '20 at 00:06

score 1 · Accepted Answer · answered Oct 29 '20 at 14:53

As @bnaecker said, you can copy the existing dataset and create a new one using a different compression filter. The new dataset can be in the same file or a new one. Note: szip requires special licensing, so I created an example going from lzf to gzip. See example below. The process is the same for any 2 compression filters. Just change compression=value.

import h5py
import numpy as np

filename = "SO_64582861.h5"

# Create random data

arr1 = np.random.uniform(-1, 1, size=(10, 3))

# Create intial HDF5 file
with h5py.File(filename, "w") as h5f:
    h5f.create_dataset("ds_lzf", data=arr1, compression="lzf")
  
# Re-Open HDF5 file in 'append' mode
# Copy ds_lzf to ds_gzip with different compression setting
# could also copy to a second HDF5 file
with h5py.File(filename, "a") as h5f:
    # List all groups
    print("Keys: %s" % h5f.keys())
    arr2 = h5f["ds_lzf"][:]
    h5f.create_dataset("ds_gzip", data=arr2, compression="gzip")

Changing HDF5 compression filter via h5py

1 Answers1