64

I have a file in hdf5 format. I know that it is supposed to be a matrix, but I want to read that matrix in R so that I can study it. I see that there is a h5r package that is supposed to help with this, but I do not see any simple to read/understand tutorial. Is such a tutorial available online. Specifically, How do you read a hdf5 object with this package, and how to actually extract the matrix?

UPDATE

I found out a package rhdf5 which is not part of CRAN but is part of BioConductoR. The interface is relatively easier to understand the the documentation and example code is quite clear. I could use it without problems. My problem it seems was the input file. The matrix that I wanted to read was actually stored in the hdf5 file as a python pickle. So every time I tried to open it and access it through R i got a segmentation fault. I did figure out how to save the matrix from within python as a tsv file and now that problem is solved.

Sam
  • 7,922
  • 16
  • 47
  • 62

5 Answers5

57

The rhdf5 package works really well, although it is not in CRAN. Install it from Bioconductor

# as of 2020-09-08, these are the updated instructions per
# https://bioconductor.org/install/

if (!requireNamespace("BiocManager", quietly = TRUE))
  install.packages("BiocManager")
BiocManager::install("rhdf5")

And to use it:

library(rhdf5)

List the objects within the file to find the data group you want to read:

h5ls("path/to/file.h5")

Read the HDF5 data:

mydata <- h5read("path/to/file.h5", "/mygroup/mydata")

And inspect the structure:

str(mydata)

(Note that multidimensional arrays may appear transposed). Also you can read groups, which will be named lists in R.

Mike T
  • 41,085
  • 18
  • 152
  • 203
  • Very good package indeed. I was thinking about using the `h5r` package from CRAN first but it seems underdocumented. If you don't mind depending on Bioconductor then `rhdf5` is definitely the way to go. – András Aszódi Jan 31 '14 at 11:04
  • 6
    Can be helpful to use function h5ls("path/to/file.h5") before reading in the h5 file. – c.gutierrez Feb 03 '14 at 23:37
  • 2
    This is a good start. Here is a decent tutorial with detail on the use of rdhf5 http://www.r-bloggers.com/working-with-hdf-files-in-r-example-pathfinder-sst-data/ – mmann1123 Apr 13 '15 at 16:43
  • 1
    @Sam I don't understand what goes into the "/mygroup/mydata" part, where do I see that information? – Herman Toothrot Aug 24 '16 at 16:50
  • @user4050 use h5ls to see the structure – Mike T Aug 24 '16 at 18:59
  • this package works with compound datatypes (i had to install package bit64 first though) – BigChief Sep 25 '16 at 20:31
24

You could also use h5, a package which I recently published on CRAN. Compared to rhdf5 it has the following features:

  1. S4 object model to directly interact with HDF5 objects like files, groups, datasets and attributes.
  2. Simpler syntax, implemented R-like subsetting operators for datasets supporting commands like readdata <- dataset[1:3, 1:3] dataset[1:3, 1:3] <- matrix(1:9, nrow = 3)
  3. Supported NA values for all data types
  4. 200+ Test cases with a code coverage of 80%+.

To save a matrix you could use:

library(h5)
testmat <- matrix(rnorm(120), ncol = 3)
# Create HDF5 File
file <- h5file("test.h5")
# Save matrix to file in group 'testgroup' and datasetname 'testmat'
file["testgroup", "testmat"] <- testmat
# Close file
h5close(file)

... and read the entire matrix back into R:

file <- h5file("test.h5")
testmat_in <- file["testgroup", "testmat"][]
h5close(file)

See also h5 on

user625626
  • 1,102
  • 2
  • 10
  • 16
5

I used the rgdal package to read HDF5 files. You do need to take care that probably the binary version of rgdal does not support hdf5. In that case, you need to build gdal from source with HDF5 support before building rgdal from source.

Alternatively, try and convert the files from hdf5 to netcdf. Once they are in netcdf, you can use the excellent ncdf package to access the data. The conversion I think could be done with the cdo tool.

Paul Hiemstra
  • 59,984
  • 12
  • 142
  • 149
  • Can you show me some template code on how to go about with this? – Sam Apr 12 '13 at 15:41
  • 1
    Your question right now is a bit broad. If you have more specific questions, including code examples, feel free to ask more questions. – Paul Hiemstra Apr 12 '13 at 19:57
5

The ncdf4 package, an interface to netCDF-4, can also be used to read hdf5 files (netCDF-4 is compatible with netCDF-3, but it uses hdf5 as the storage layer).

In the developer's words:

NetCDF-4 combines the netCDF-3 and HDF5 data models, taking the desirable characteristics of each, while taking advantage of their separate strengths

The netCDF-4 format implements and expands the netCDF-3 data model by using an enhanced version of HDF5 as the storage layer.

In practice, ncdf4 provides a simple interface, and migrating code from using older hdf5 and ncdf packages to a single ncdf4 package has made our code less buggy and easier to write (some of my trials and workarounds are documented in my previous answer).

Community
  • 1
  • 1
David LeBauer
  • 31,011
  • 31
  • 115
  • 189
0

This solution Solve my problem:

# in Linux terminal
1. sudo apt-get install libhdf5-dev

#then in the R console write
2. BiocManager::install('rhdf5')
3. install.packages("hdf5r")

Omid Erfanmanesh
  • 547
  • 1
  • 7
  • 29