1

I have to reimplement printf(3) with C wihtout using any function that would do the conversion for me.

I thought I was done after I understood thanks you to guys how %a worked: How %a conversion work in printf statement?

Then I realized I didn't understand how the rounding was done so I asked: C printf float rounding and then thought that I was done after you helped me.

And now that %a is completely working just like the official one I thought %La would do exactly the same as %a but with a long double since the man only says:

Modifier a, A, e, E, f, F, g, G

l (ell) double (ignored, same behavior as without it)
L long double

And I discover that it outputs something completely different :'(

double a_double = 0.0001;
long double a_long_double = 0.0001;
printf("%a\n", a_double); #=> 0x1.a36e2eb1c432dp-14
printf("%La\n", a_long_double); #=> 0xd.1b71758e21968p-17

The %a result always begins with 1. and now I don't understand at all what the %La is doing.

Could you help me understand the process that transforms 0.0001 to 0xd.1b71758e21968p-17 ?

EDIT: the part that I really don't understand is why %a always outputs something that starts with 1. and not %La ?

EDIT2: To be even more precise: why does %a chooses to output 1. ...p-14 and %La chooses to output d. ...p-17 ?

Community
  • 1
  • 1
ItsASecret
  • 2,589
  • 3
  • 19
  • 32
  • @Wintermute: But the whole output is rounded anyways... The two numbers that are printed are identical, but the chosen exponent is different. – Bill Lynch Mar 20 '15 at 16:48
  • 1
    @BillLynch Yes exactly my question! – ItsASecret Mar 20 '15 at 16:50
  • @Wintermute: there are no differences between both numbers (beside the type): if you shift three bits one representation, you'll get the other. – AntoineL Mar 20 '15 at 17:35
  • @BillLynch: Your point is correct, just a nit-pick: the very purpose of %a is to NOT round the number to output! – AntoineL Mar 20 '15 at 17:35
  • @Wintermute: if you don't understand what's actually going on, please don't tell people that it's "floating point rounding errors". There is no rounding error at work here; in fact, there is no rounding at all in the formatting. – Stephen Canon Mar 20 '15 at 17:43
  • Yes, yes, I retract the earlier comment; I thought it was something more simple than it was; that was my bad. One thing though: I never meant to say anything about rounding errors in formatting but in the machine representation. Those most certainly exist, although they're not the deciding factor I thought they were. (Or at all influential, taking a second look. T'was silly.) – Wintermute Mar 20 '15 at 18:48
  • On my FreeBSD AMD64 system, both the double and long double print as 0x1.a36e2eb1c432dp-14 so this is definitely something that is platform-dependent. – juhist Mar 20 '15 at 19:13

3 Answers3

7

Any (finite, non-zero) floating point number has four distinct (and valid) hexadecimal floating-point representations:

  1. one which highest hexadecimal digit is between 8 and F
  2. another which highest hexadecimal digit is between 4 and 7
  3. another which highest hexadecimal digit is 2 or 3
  4. another which highest hexadecimal digit is 1

and they have successive increasing exponent values. You can pass from one to another using right shifting; this process might raise another digit to the right; there is no restriction in the C Standard about which representation you should use for the %a conversion.

With IEEE float (24-bit significand) or the usual double extended (64-bit significand, as with long double on i386 Linux), since the number of bits is divisible by 4, it is customary to use the first form for a normalized number, since it is the form which uses the fewest hexadecimal digits to fully represent the number (respectively 5 and 15 digits after the hexadecimal point .).

With IEEE double (53-bit significand), on the other hand, it looks better to use the fourth form and 13 hexadecimal digits after the point .: thus all the shown digits are actually representing data. The same would happen with IEEE quad (formally binary128, 113-bit significand), with 28 hexadecimal digits after the point.

These representations have also the nice property to match the in-memory representation of the numbers (this is why implementations are guided to do so by a footnote in the C99 standard.)

If we now look at your question, the first example (double) matches the guidelines above, having 1 as first digit and then 13 hexadecimal digits after the point.

The second example is more crafty. First a double constant is stored into a long double variable: depending on the compiler, the constant may be rounded to double's precision (as it seems to be done here.) This means that the bits after the 53th will be all zeroes. Then the %La conversion used, also according to the guidelines above, thus choosing the first representation, start with a D (1101xxx in binary, to be compared with the 1.A which translated to 11010xxx); here too there are only 13 hexadecimal digits after the point since the compiler discarded printing the unnecessary zeroes (due to the rounding.)

Note that you cannot use printf's %a conversion for float.

AntoineL
  • 888
  • 4
  • 25
1

It's not clear what aspect of the result it surprising you. If it's the difference in exponents and digits, that's just because of poor normalization. Note that for the first result you get a 1 before the radix point and for the second you get a d before it. Shift d right 3 bits and you have a 1, and the lower bits (101) shift into the next position after the radix point, giving a, as expected.

As for how 0.0001 is transformed to 0xd.1b71758e21968p-17, it's just a matter of finding the binary floating point value that's closest to the decimal number 0.0001. The mechanism for how that's done efficiently is somewhat involved, but the concept is simple.

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
  • 1
    Why does one outputs something that always starts with '1.' (%a) and not the other (%La) ? – ItsASecret Mar 20 '15 at 16:46
  • 2
    @ItsASecret `binary64` has 52 explicit bits, and one implicit leading bit. 52 is divisible by 4 so it would be wasteful and annoying to print it as anything other than `1.xxxxxxxxxxxxx`. On the other hand, the extended-double you're dealing with has 64 bits in total for the significand and is not encoded with an implicit bit. To avoid overflowing into an additional character, most `printf` implementations print it as `x.xxxxxxxxxxxxxxx`. – Pascal Cuoq Mar 20 '15 at 17:43
  • @ItsASecret: I consider that an implementation flaw. From a standpoint of presenting the most useful information at a given precision specification, you want the digit before the radix point to always be in the range `8..f`. Printing a 1 results in gratuitous loss of precision. But it is a valid implementation. – R.. GitHub STOP HELPING ICE Mar 20 '15 at 18:05
  • @PascalCuoq: That only applies if the printing precision is sufficient to represent the value exactly (which is true with no explicit precision specified). If the caller limits the precision, it's undesirable to use a leading 1. – R.. GitHub STOP HELPING ICE Mar 20 '15 at 18:06
  • @R.. To display 53 bits, you are always "gratuitously" loosing at some end; the benefit to use the right-aligned representation, and thus to loose at the left end being it always 1., is to match the in-memory representation. As a debugging tool, I consider it worth while. It also makes the implementation slightly easier if you are hand-coding, which again is a good point from a debugging perspective (I dislike chasing libc bugs.) – AntoineL Mar 20 '15 at 18:22
  • @AntoineL: I'm talking about what happens if you use `%.1a`. With a leading 1 you only get 5 bits of precision. With a leading digit in the range `8..f`, you have 8 bits. – R.. GitHub STOP HELPING ICE Mar 20 '15 at 18:53
  • @R.. About `%.1a`, I would agree with you, and for small precision I would do it your way, maxing the printed accuracy. But you'll agree with me that plain `%a` (or `%La`) is a different beast; and the fact is, the meaning of the precision, and its default value, are not at all homogeneous between the various floating-point specifiers. – AntoineL Mar 20 '15 at 19:02
1

You don't say what kind of machine you are working on here, but from the fact that you get a 0xd.... result from %La implies that this is a machine where long double is an unnormalized floating poing type (such as 80-bit floats on an 8087). With a normalized float, you will always get a result that starts with 0x1.... as you were guessing, but for unnormalized floats, there are mulitple representations of the same number, which is what you are seeing here.

The whole point of the %a conversion spec is to allow for printing a binary floating point number as text in a way that it can later be read back with scanf that will result in a bit-identical value on a machine with the same floating point representation.

Chris Dodd
  • 119,907
  • 13
  • 134
  • 226
  • Oh ok I had no idea that this would have any impact, I am on OS X 10.10.2 – ItsASecret Mar 20 '15 at 17:41
  • Do not see in the C spec that the first digit must be `0x1` "A double argument representing a floating-point number is converted in the style [−]0xh.hhhh p±d, , where there is one hexadecimal digit (which is nonzero if the argument is a normalized floating-point number and is otherwise unspecified) before the decimal-point character ..." C11dr §7.21.6.1 8 – chux - Reinstate Monica Mar 20 '15 at 17:42
  • 1
    The leading bit of 80-bit extended doubles is always 1 for normal numbers. The fact that this leading bit is explicit does not change the way it should be printed. The fact that 64 is divisible by 4 makes it tempting to print no more than 16 hex digits in total, though. – Pascal Cuoq Mar 20 '15 at 17:46
  • A footnote in the C spec specifically address a leading digit that is not 0 or 1: "Binary implementations can choose the hexadecimal digit to the left of the decimal-point character so that subsequent digits align to nibble (4-bit) boundaries". This contradicts this answer`s "With a normalized float, you will always get a result that starts with 0x1." – chux - Reinstate Monica Mar 20 '15 at 17:48
  • @chux: while the spec doesn't formally require it, it would be a rather bizarre implementation that did extra work to produce something else for a floating point format with a hidden 1 bit. – Chris Dodd Mar 20 '15 at 17:52
  • @PascalCuoq: Since the leading bit is explicit in 8087-extended, you can produce numbers where it is clear. So the printf/scanf code needs to be able to deal with that case. – Chris Dodd Mar 20 '15 at 17:53
  • @PascalCuoq: more exactly, this is true for the 80-bit representation chosen by Intel and Motorola for the minimal-sized double extended format; one can conceptually think of another 80-bit format using one more bit in the significand, having the hidden bit in the external representation, being formally compliant with IEEE 754:1985, and being incompatible; of course the last point is the killer one. ;-) – AntoineL Mar 20 '15 at 17:58
  • @ChrisDodd Since the 387, all Intel FPUs raise an exception on values with the leading bit being 0 while the exponent indicates a normal number. This hardly counts as “dealing with”. Source: http://en.wikipedia.org/wiki/Extended_precision – Pascal Cuoq Mar 20 '15 at 18:08
  • @ChrisDodd W.r.t chux remark: actually not bizarre at all: if you ever use the %a specifier against VAX D-float (56-bit significand, with one hidden bit), you are likely to have the first digit between 8 and F, because it is the natural way it aligns. – AntoineL Mar 20 '15 at 18:10
  • @ChrisDodd ".. printf code needs to be able to deal with that case. (leading bit 0)." Although it is useful to indicate such unnormalized numbers, 1) the C spec does not require equal valued unnormalized (subnormal) / normalized numbers to display differently 2) In a test case, small unnormalized numbers near 0.0 do not print in a way that indicate "unnormal-ness". Still maintain OP is not seeing an unnormalized number. – chux - Reinstate Monica Mar 20 '15 at 18:58
  • @chux: You are correct in that this is yet another place where the C spec is "broken" -- underspecified, allowing behavior that completely breaks the original intent of the feature in question. – Chris Dodd Mar 20 '15 at 19:06
  • @AntoineL: using "not bizarre" and "VAX D-float" together is an oxymoron :-) – Chris Dodd Mar 20 '15 at 19:12
  • @ChrisDodd: about the underspecification: actually, there are real reasons to not force any representation in the Standard; as you can read in the comments below http://stackoverflow.com/a/29171691/656988 there is a base contradiction in intent between "short" precision like `%.1a`, where you prefer to maximize the displayed accuracy; and `%a` (or hexadecimal floating-point like on http://en.wikipedia.org/wiki/IBM_Floating_Point_Architecture) where you would prefer to align the *nibbles*. Ah, and sorry to exhibit again the bizarre :-D – AntoineL Mar 20 '15 at 19:30