74

Anyone out there have enough experience w/ NetCDF and HDF5 to give some pluses / minuses about them as a way of storing scientific data?

I've used HDF5 and would like to read/write via Java but the interface is essentially a wrapper around the C libraries, which I have found confusing, so NetCDF seems intriguing but I know almost nothing about it.

edit: my application is "only" for datalogging, so that I get a file that has a self-describing format. Important features for me are being able to add arbitrary metadata, having fast write access for appending to byte arrays, and having single-writer / multiple-reader concurrency (strongly preferred but not a must-have. NetCDF docs say they have SWMR but don't say whether they support any mechanism for ensuring that two writers can't open the same file at once with disastrous results). I like the hierarchical aspect of HDF5 (in particular I love the directed-acyclic-graph hierarchy, much more flexible than a "regular" filesystem-like hierarchy), am reading the NetCDF docs now... if it only allows one dataset per file then it probably won't work for me. :(

update — looks like NetCDF-Java reads from netCDF-4 files but only writes from netCDF-3 files which don't support hierarchical groups. darn.

update 2009-Jul-14: I am starting to get really upset with HDF5 in Java. The library available isn't that great and it has some major stumbling blocks that have to do with Java's abstraction layers (compound data types). A great file format for C but looks like I just lose. >:(

Sled
  • 18,541
  • 27
  • 119
  • 168
Jason S
  • 184,598
  • 164
  • 608
  • 970
  • 3
    postscript: HDF5 is *much* easier to use in Python with PyTables, than Java. – Jason S Nov 11 '14 at 13:46
  • Unfortunately for Java users, both netCDF and HDF5 are developed in C, primarily for C or Fortran users. Most of the other APIs, like Python, are built atop the C layer.) – Edward Hartnett Jun 15 '16 at 13:56
  • @EdwardHartnett -- I don't buy that argument. Certainly it means that you don't get any nice Java features for free, but people have taken the plunge to create useful APIs in Python. There's no reason someone couldn't do that in Java. (And actually, I did that myself -- to a small extent -- at a former company when I posted this question back in 2009, but I don't have access to that code.) – Jason S Jun 15 '16 at 21:15

7 Answers7

32

I strongly suggest you HDF5 instead of NetCDF. NetCDF is flat, and it gets very dirty after a while if you are not able to classify stuff. Of course classification is also a matter of debate, but at least you have this flexibility.

We performed an accurate evaluation of HDF5 vs. NetCDF when I wrote Q5Cost, and the final result was for HDF5 hands down.

Stefano Borini
  • 138,652
  • 96
  • 297
  • 431
  • 53
    the answer is outdated - netCDF is now built on HDF5 – Abe Oct 11 '13 at 00:04
  • @abe not necessarily. netcdf4 still has some backward compatibility w netcdf3. that means some compression options still aren't availble to nc files. – badgley Oct 11 '13 at 17:37
  • 1
    @badgley - what compression options are missing from netCDF when using it to write netCDF-4 files? – Sean A. Apr 21 '15 at 18:40
  • @StefanoBorini Would be great if you could clarify whether your evaluation still applies to NetCDF-4/HDF5 or only earlier versions. – spinkus Apr 20 '16 at 10:58
  • 1
    NetCDF-4 exposes almost all the features of HDF5, including compression. H5utils will work on netCDF-4 files, which are also perfectly valid HDF5 files. – Edward Hartnett May 02 '16 at 21:48
  • This "NetCDF-4 on top of HDF5" means exactly what? The user should adopt NetCDF-4 and creates/work with HDF5 files, because the API is more forgiving or "one above" the other? (Fyi. I am currently checking capabilities of MATLAB https://www.mathworks.com/help/matlab/scientific-data.html) –  Aug 21 '23 at 15:31
25

I'll have to admit using HDF5 is very much easier in the long run. It's not hard to get simple data structures into NetCDF format, but manipulating them down the road is kind of a pain.

The "H" in HDF5 stands for "heirarchical", which translated (for me anyway) into a REALLY easy way to manipulate data, by just moving nodes around and referencing nodes from other places.

Can I ask what kind of project this is? I use these both for a lot of HPC scientific modeling tasks. Can I assume you're doing the same? If so, the trend I'm seeing is people moving to HDF5, but that might be different in your particular domain.

However you end up going, best of luck!

Mike
  • 4,542
  • 1
  • 26
  • 29
  • 3
    afaik, NetCDF4 is a kind of dumbed down HDF5 so that it is familiar to those used to previous versions of NetCDF. http://www.unidata.ucar.edu/mailing_lists/archives/netcdfgroup/2010/msg00170.html – mdsumner Dec 24 '10 at 06:55
  • 1
    It is, but its more they've tries to impose structure than dumb down - https://www.unidata.ucar.edu/software/netcdf/docs/netcdf_introduction.html#netcdf_4_format. – spinkus Apr 20 '16 at 10:41
  • 1
    NetCDF-4 exposes almost all HDF5 features, except for some petty obscure exceptions. – Edward Hartnett May 02 '16 at 21:54
23

NetCDF, starting with version 4.0 (2008) can read and write most HDF5 files, and provides access to the hierarchical features of HDF5 via the enhanced data model.

HDF5 is extremely feature-rich, and has some great performance features.

NetCDF has a simpler API, and a much wider tool base. There are many tools that handle netCDF data.

Edward Hartnett
  • 882
  • 7
  • 16
  • Last I checked, the Java library didn't allow for writing HDF5 files. Anyway, it's a moot point as I've moved on to other things. :-/ – Jason S Jul 18 '11 at 11:37
  • Thanks for the concise answer, that's very useful info, although it'd be even better if it had some references :) – naught101 Oct 17 '13 at 04:48
  • "can read and write most HDF5 files". No it can't. NetCDF4 use HDF5 like an application uses a filesystem. It reads and writes a specific structure imposed on HDF5 1.8 – spinkus Apr 20 '16 at 10:43
  • NetCDF-4 can read all HDF5 files that don't use references or have circular group structure. For a full list of restrictions on HDF5 files that can be read by netCDF-4, see the FAQ: http://www.unidata.ucar.edu/software/netcdf/docs/faq.html#How-can-I-convert-HDF5-files-into-netCDF-4-files – Edward Hartnett May 07 '16 at 15:33
11

I know this is an older post, and the original poster has indicated they've moved on, but for anyone that ends up here...the netCDF-Java library (as of 4.3.13) has netCDF-4 write support via the netCDF C library. It's still in beta, but it does work and feedback is certainly appreciated!

Please see the netCDF-Java reference docs for more details.

Sean A.
  • 652
  • 5
  • 9
9

1) Netcdf-4 C library is a layer on top of HDF-5 C library. The API is considered simpler than the HDF5 library, but in the end you have pretty much the same functionality. Netcdf does not support graphs, but HDF5 does. In fact, HDF does not prevent cycles in your graph i think.

2) the HDF group has a Java API on top of HDF-5 C library.

3) Unidata has Netcdf-Java library which is pure Java, but can only read HDF-5.

John Caron
  • 1,367
  • 1
  • 10
  • 15
  • Because HDF5 does not implement shared dimensions, there is an argument (disclaimer: by me) that you should write netCDF-4, not directly HDF5, details here: http://www.unidata.ucar.edu/blogs/developer/en/entry/dimensions_scales. – John Caron Oct 19 '15 at 01:23
8

Try writing some small sample application in each, and compare the experience. If future scalability of your code to parallel execution (via MPI or the like) is important to you, I know that HDF has a parallel implementation, which people are constantly working to improve. I'm not sure about NetCDF.

Late edit: For NetCDF, there is now Parallel NetCDF from Argonne. It works quite well, and the development team is quite active in improving it further.

Phil Miller
  • 36,389
  • 13
  • 67
  • 90
  • Parallel IO is also supported directly by Unidata's netCDF library, which uses either HDF5 or parallel-netcdf under the covers to provide parallel IO. – Edward Hartnett May 02 '16 at 21:51
-2

NetCDF, which translates HDF5 into its own data model, looks and works great... until you find out that NetCDF doesn't support unsigned values! See also my question on how to detect unsigned values in existing HDF5 files using NetCDF.

Update: Actually, it turns out that although NetCDF-3 doesn't support signed values, NetCDF-4 supports signed values, even though the NetCDF API in Java for determining signedness is a little convoluted.

Community
  • 1
  • 1
Garret Wilson
  • 18,219
  • 30
  • 144
  • 272
  • 1
    Um... half your answer says that NetCDF doesn't support *unsigned* values, and the other half suggests it doesn't support *signed* values. Which is it gonna be? The first link only says that NetCDF 3 doesn't have unsigned *integers*, not values generally. Also, the second link indicates the problem is with *java*, not netCDF4. And really, what does it matter anyway? It means you have half as many integers for indexing, but you still have 2^31 (= 2 billion) or 2^63 (9 * 10^18), depending on your system. – naught101 Oct 17 '13 at 04:59
  • 2
    To clarify, the netCDF-4 C library supports unsigned integers (8, 16, 32, and 64 bit). The netCDF Java library cannot create unsigned types, but can read unsigned types of size 8, 16, and 32 bits by promoting them to signed types of the next larger size. (That is, a 16-bit unsigned integer field in the netCDF file will look like a 32-bit signed field in java.) This is all due to the fact that Java does not support unsigned types. – Edward Hartnett May 02 '16 at 21:45