Best IEEE 754-1985 representation for X3.9-1978 based standard

Question

As per DICOM standard, a type of floating point can be stored using a Value Representation of Decimal String. See Table 6.2-1. DICOM Value Representations:

Decimal String: A string of characters representing either a fixed point number or a floating point number. A fixed point number shall contain only the characters 0-9 with an optional leading "+" or "-" and an optional "." to mark the decimal point. A floating point number shall be conveyed as defined in ANSI X3.9, with an "E" or "e" to indicate the start of the exponent. Decimal Strings may be padded with leading or trailing spaces. Embedded spaces are not allowed.

"0"-"9", "+", "-", "E", "e", "." and the SPACE character of Default Character Repertoire. 16 bytes maximum

So I would be tempted to simply use 64 bits double (IEEE 754-1985) to represent the value in memory in my C code, based on the fact that input is stored on a maximum of 16 bytes.

Could someone with a little bif more knowledge of X3.9-1978 confirms that this is the best possible representation (compared to arbitrary-precision, float and/or long double) ? By best, I mean the representation were round-trip read/write will be visually lossless. I should be able to read such ASCII floating point representation from disk, put it into memory, and write it back to disk (as specified above) with maximum accuracy compared to the original values (= machine epsilon when possible). The actual implementation details on how to represent a double as ASCII with only 16 bytes of storage is outside the scope of this question, see here for details..

IEEE 754 binary64 isn't *quite* good enough to be able to round-trip all these values, but it's pretty close. The only place you're potentially going to lose information is with odd 16-digit integers between `9007199254740993` and `9999999999999999` with no sign. E.g., the 16-character strings `9999999999999997` and `9999999999999996` would both map to the same IEEE 754 binary64 float value. The moment you have a sign, or a decimal point, or an exponent, you have 15 or fewer significant digits, which the binary64 format will handle faithfully. Same for 16-digit integers smaller than 2**53. — Mark Dickinson, Sep 17 '15 at 11:50

score 4 · Accepted Answer · edited Sep 18 '15 at 07:55

This is heavily based on Hans Passant's and Mark Dickinson's comments.

Using any floating point type to represent decimal values is generally a bad idea because binary floating point types cannot exactly represent decimal values. Typically never use them for processing exact monetary values.

But here, DICOM spec sets the limit to 16 characters, when the precision of a double is about 15-16 decimal digits (ref.). As soon as your decimal string contains one sign (+/-), a dot (.) or an exponent part (e/E), you will have at most 15 decimal digits and a round trip should be correct. The only problems should occur when you have 16 digits. The example provided by Mark Dickinson is: the 16-character strings 9999999999999997 and 9999999999999996 would both map to the same IEEE 754 binary64 float value.

TL/DR: Hans Passant gave a nice abstract: "16 bytes maximum" [is] exactly as many accurate significant digits you can store in a double. This DICOM spec was written to let you use double. So just use it

Disclaimer: All values acceptable in IEEE 754 will be correctly processed, but beware, 1e1024 will be an acceptable value for a DICOM Decimal String, but it in not representable in a double (limited at about 1e308).

Best IEEE 754-1985 representation for X3.9-1978 based standard

1 Answers1

Linked