2

My team has been given HDF5 files to read. They contain structured data with unsigned variables. I and my team were overjoyed to find the NetCDF library, which allows pure-Java reading of HDF5 files, albeit using the NetCDF data model.

No problem---we thought we'd just translate from the NetCDF data model to whatever model we wanted. As long as we get the data out. Then we tried to read an unsigned 32-bit integer from the HDF5 file. We can load up HDFView 2.9 and see that the variable is an unsigned 32-bit integer. But... it turns out that NetCDF-3 doesn't support unsigned values!

To add insult to injury, NetCDF-3 recommends that you "widen the data type" or use an _Unsigned = "true" attribute (I am not making this up) to indicate that the 32 bits should be treated as an unsigned value.

Well, maybe those kludges would be effective if I were creating NetCDF data from scratch, but how can I detect using NetCDF that a 32-bit value in an existing HDF5 file should be interpreted as unsigned?

Update: Apparently NetCDF-4 does support unsigned data types. So this begs the question: How can I determine whether a value is signed or unsigned from the NetCDF Java library?" I don't see any unsigned types in ucar.ma2.DataType.

Garret Wilson
  • 18,219
  • 30
  • 144
  • 272
  • What about one of the other pieces of software listed on the HDF Group website? http://www.hdfgroup.org/products/hdf5_tools/ – Robert Harvey Apr 30 '13 at 22:33
  • @RobertHarvey, I don't see any other pure-Java libraries at that page. And thanks for the link, but even if they did my question would still remain unanswered. – Garret Wilson Apr 30 '13 at 22:37

3 Answers3

3

Yes, you can look for _Unsigned = "true" attribute, or you can call Variable.isUnsigned().

Because Java doesnt support unsigned types, it was a difficult design decision. Ultimately we decided not to automatically widen the type, for efficiency. So the application must check and do the right thing. Look at ucar.nc2.DataType.unsignedXXX() helper methods.

When you read the data, you get an Array object. you can call Array.isUnsigned(). Also the extractors like Array.getDouble() will convert correctly.

The netCDF-Java library supports an extended data model called the "Common data Model" to abstract out differences in file formats. So we are not stuck with the limits of the netCDF-3 file format or data model. But we are in Java

John

John Caron
  • 1,367
  • 1
  • 10
  • 15
1

Given the fact that Java doesnt have unsigned types, I think the only options are to 1) automatically widen unsigned data (turn bytes into shorts, shorts into ins, ints into longs), or 2) represent both signed and unsigned integers with the available Java data types, and let the user decide if/when it should be widened.

Arguably the main use for unsigned data is to represent bits, and in that case conversion would be a waste, since you will just mask and test the bits.

The other main use is for eg satellite data which often uses unsigned bytes, and there again I think not automatically widening is the right choice. What you end up doing is just widening right at the point you use it.

John Caron
  • 47
  • 4
  • I have no argument with option #2---I agree completely. What I find disingenuous is that the Java library, in its enumeration of types, doesn't include any unsigned types. Instead, it claims that all integer types are signed (even if they are really HDF5 unsigned types!), and requires the user to check some kludge attribute (that isn't part of the underlying HDF5 data!) to see if the data is really intended to be unsigned. If the Java API, like the other APIs, would simply indicate the true underlying type, then the user could decide if/when it should be widened without jumping through hoops. – Garret Wilson May 01 '13 at 20:07
  • "Instead, it claims that all integer types are signed" Where does the library claim that? – John Caron May 02 '13 at 21:08
  • The enumeration of data types, `ucar.ma2.DataType`, only contains signed types, including `SHORT`, `INT`,`LONG`,`FLOAT`, and `DOUBLE`. It would be natural (as does most type systems) to include `USHORT`, `UINT`, and `ULONG` as additional types. Instead, one is forced to check for an artificial signedness attribute, completely distinct from the type system. If you still don't see the difference, let me ask you: does the C/C++ API include e.g. ushort/uint/ulong in its list of types? I would predict it does, but maybe I'm wrong and NetCDF uses this artificial attribute in all APIs. – Garret Wilson May 02 '13 at 21:34
0

It seems that when the CDM data types are mapped to Java, NetCDF will automatically add the attribute _Unsigned = "true" to the variable. So I assume that if I check for that attribute, it will indicate if the value is unsigned or not. This may be exactly what I was looking for; I'll verify tomorrow that it works.

Update: I tried this and it works; moreover, as John Caron indicated in the accepted answer, a NetCDF array has an isUnsigned() method which checks for the _Unsigned attribute.

Garret Wilson
  • 18,219
  • 30
  • 144
  • 272
  • Frankly I find this approach convoluted and not even semantically correct. The NetCDF-4 model has unsigned types. The variables in the underlying file use unsigned types. Using a non-Java API I could see that the types are unsigned. In the Java API, to say that "the types are signed" is really a lie, even with a kludge attribute that says "well, really they are unsigned". So what the API should be saying, in my opinion, is that "these values use an unsigned data type, but you can only retrieve them as a signed type". Instead the API is saying, "these types are signed (nudge nudge, wink wink)". – Garret Wilson May 01 '13 at 16:37