Reading large HDF5 files in R

Question

I've been using the 'rhdf5' package in R recently and have found it very useful until I attempted to read a file that was 190Mb or larger. In particular, I'm grabbing the data from a database, writing to HDF5 format (successfully, regardless of size) and then reading back in to R at a later time. When my file size exceeds 190Mb, I get the following error:

Error: segfault from C stack overflow

In my case, this corresponds to a dataframe with roughly 1950000 rows.

While reading the package documentation, I got the idea that chunking the data might solve the problem. However, chunking doesn't seem to work for compound dataframes. Here's some example code:

# save a matrix with chunking: works
  mat = cbind(1:10, 11:20)
  h5createFile("test.h5")
  h5createDataset(file="test.h5", dataset="dat", dim=c(10,2), chunk=c(5,2), level=7)
  h5write(mat, file="test.h5", name="dat")

# convert to data frame: won't work now
  df = as.data.frame(mat)
  df[,2] = as.character(mat[,2])
  h5createFile("test2.h5")
  h5createDataset(file="test2.h5", dataset="dat", dim=c(10,2), chunk=c(5,2), level=7)
  h5write(df, file="test2.h5", name="dat")
  #h5write(df, file="test2.h5", name="dat", index=list(1:10, 1:2))

# try to use alternate function
  fid = H5Fcreate("test3.h5")
  h5createDataset(file="test3.h5", dataset="dat", dim=c(10,2), chunk=c(5,2), level=7)
  h5writeDataset.data.frame(df, fid, name="dat", level=7, DataFrameAsCompound=FALSE)
  #h5writeDataset.data.frame(df, fid, name="dat", level=7, DataFrameAsCompound=FALSE, index=list(1:10,1:2))

It's possible that chunking won't help. Either way, I'd appreciate it if anyone has advice on reading large HDF5 files into R.

For problems like this, where there seems to be a coding issue with the package, it seems like the package maintainer is the best person to contact -- they'll be interested in hearing about and fixing this. `packageDescription('rhdf5')$Maintainer` — Martin Morgan, Nov 27 '13 at 03:33

score 1 · Answer 1 · edited May 23 '17 at 11:45

1

An easy workaround is to increase the C stack size before starting R. You can do this by ulimit -s 16384 (assuming ulimit -s prints 8192 which is typical; you can choose your own values). See here for more details: https://stackoverflow.com/a/14719448/4323

edited May 23 '17 at 11:45

Community

1
1

answered Sep 04 '14 at 01:51

John Zwinck

239,568
38
324
436

score 0 · Answer 2 · answered Nov 27 '13 at 01:48

Do you want it for MODIS? I was almost solving the same issue, but then I downloaded MODIS rasters in GeoTIFF. Much easier. But if you insist, then there is a MRT - Modis Reprojection Tool - command line tool to convert HDF to other formats, which you can open in R. I think that the HDF support in R you mention must be new and not well debugged yet, because few months ago I did some research and many resources concluded there is no support in R. See also this MODIS in R tutorial .

Other resources:

http://www.r-bloggers.com/modis-r-package-tutorial/

http://www.spatial-analyst.net/wiki/?title=Download_and_resampling_of_MODIS_images

Reading large HDF5 files in R

2 Answers2