1

I am looking for a good storage format for large, gridded datasets. The application is meteorology, and we would prefer a format that is common within this field (to help exchange data with others). I don't need to deal with special data structures, and there should be a Fortran API. I am currently considering HDF5, GRIB2 and NetCDF4.

How do these formats compare in terms of data compression? What are their main limitations? How steep is the learning curve? Are there any other storage formats worth investigating?

I have not found a great deal of material outlining the differences and pros/cons of these formats (there is one relevant SO thread, and a presentation comparing GRIB and NetCDF).

Community
  • 1
  • 1
nullglob
  • 6,903
  • 1
  • 29
  • 31
  • There is a nice Fortran wrapper for HDF5 called FUTILS - this simplifies writing HDF5 file alot, at the expense of being able to use parallel HDF5 IO. – Chris Dec 07 '11 at 23:49

2 Answers2

3

Sorry I'm not in meteorology, but it looks to me that the scientific community is moving towards HDF5, see for example the NERSC page:

http://www.nersc.gov/users/training/online-tutorials/introduction-to-scientific-i-o/

I had to take the same choice for astrophysics data, as we historically use FITS, and I found quite easy to start using HDF5, as there are APIs not only fortran and C but also C++, and also a python package (h5py).

Andrea Zonca
  • 8,378
  • 9
  • 42
  • 70
3

I would certainly consider HDF5 as it seems to be the trend in the scientific community.

Also, HDF5 has builtin filters (including compression filters) or you can also write your own.

Finally take a look into HDF5 "chunked" datasets as they might prove really useful if you have gridded datasets.

http://www.hdfgroup.org/

mcieec
  • 155
  • 1
  • 10