1

New to C programming, and I've been told to avoid unions which in general makes perfect sense and I agree with. However, as part of an academic exercise I'm writing an emulator for hardware single-precision floating point addition by doing bit manipulation operations on unsigned 32-bit integers. I only mention that to explain why I want to use unions; I'm having no trouble with the emulation.

In order to test this emulator, I wrote a test program. But of course I'm trying to find the bit representation of floats on my hardware, so I thought this could be the perfect use for a union. I wrote this union:

typedef union {
  float floatRep;
  uint32_t unsignedIntRep;
} FloatExaminer;

This way, I can initialize a float with the floatRep member and then examine the bits with the unsignedIntRep member.

This worked most of the time, but when I got to NaN addition, I started running into trouble. The exact situation was that I wrote a function to automate these tests. The gist of it was this:

void addTest(float op1, float op2){
  FloatExaminer result;
  result.floatRep = op1 + op2;

  printf("%f + %f = %f\n", op1, op2, result.floatRep);
  //print bit pattern as well
  printf("Bit pattern of result: %08x", result.unsignedIntRep);
}

OK, now for the confusing part:

I added a NAN and a NAN with different mantissa bit patterns to differentiate between the two. On my particular hardware, it's supposed to return the second NAN operand (making it quiet if it was signalling). (I'll explain how I know this below.) However, passing the bit patterns op1=0x7fc00001, op2=0x7fc00002 would return op1, 0x7fc00001, every time!

I know it's supposed to return the second operand because I tried--outside the function--initializing as an integer and casting to a float as below:

uint32_t intRep1 = 0x7fc00001;
uint32_t intRep2 = 0x7fc00002;
float *op1 = (float *) &intRep1;
float *op2 = (float *) &intRep2;
float result = *op1 + *op2;
uint32_t *intResult = (uint32_t *)&result;
printf("%08x", *intResult); //bit pattern 0x7fc00002

In the end, I've concluded that unions are evil and I should never use them. However, does anyone know why I'm getting the result I am? Did I make stupid mistake or assumption? (I understand that hardware architecture varies, but this just seems bizarre.)

Cappielung
  • 322
  • 3
  • 6
  • 1
    I don't see why you are violating strict pointer aliasing and not using the `union`. – Weather Vane Feb 11 '17 at 20:43
  • Hey, like I said, new to C, so I'm not really sure what you're suggesting. Could you explain? – Cappielung Feb 11 '17 at 20:46
  • You might like to [read this](http://stackoverflow.com/questions/98650/what-is-the-strict-aliasing-rule) previous question. – Weather Vane Feb 11 '17 at 20:51
  • Did you look at the generated code? – melpomene Feb 11 '17 at 20:54
  • Thanks for the link, @WeatherVane. Interesting...so my use of union was best practice? Does that mean that casting to pointers of different types creates the undefined behavior and not the other way around? – Cappielung Feb 11 '17 at 21:01
  • Reverting to the object of your enquiry, I don't understand why you expect to get meaningful results when you do operations on two `NAN`s. What benefit can there be in distinguishing the first `NAN` from the second `NAN` and the resulting third `NAN` when they are all "not a number"? – Weather Vane Feb 11 '17 at 21:13
  • Well, it's an academic exercise, not a practical application. You ask a fair question, but my initial question arose because seemingly identical inputs created different outputs and I didn't understand why. @melpomene, I don't know really know how to examine the generated code. I could look at the bits in the .exe...but I don't know what I'd be looking for. – Cappielung Feb 11 '17 at 21:21
  • Your compiler undoubtedly has a flag which will cause it to produce readable assembler rather than an object file. On gcc and clang, it is `-S`. Also see http://gcc.godbolt.org – rici Feb 11 '17 at 21:34
  • Perhaps that's the floating point equivalent of *undefined behaviour* and the system is being as kind as it can be under the circumstances, without continuing to produce anything more that is consistently meaningful, and without crashing. – Weather Vane Feb 11 '17 at 21:36
  • Note that the 'signalling NAN' is the floating point hardware equivalent of an exception, like division by zero. It is a 'trap' representation, i.e. doing *any* manipulation with it yields undefined behaviour. The scope for undefined behaviour is much wider than you may expect, as [uninitialised floating point variables may also lead to undefined behaviour](http://yosefk.com/blog/fun-with-ub-in-c-returning-uninitialized-floats.html). The pointer based casting approach is also wrong: that violates aliasing rules, and therefore also leads to undefined behaviour. – user268396 Feb 11 '17 at 23:59

1 Answers1

0

I'm assuming that when you say "my particular hardware", you are referring to an Intel processor using SSE floating point. But in fact, that architecture has a different rule, according to the Intel® 64 and IA-32 Architectures Software Developer's Manual. Here's a summary of Table 4.7 ("Rules for handling NaNs") from Volume 1 of that documentation, which describes the handling of NaNs in arithmetic operations: (QNaN is a quiet NaN; SNaN is a signalling NaN; I've only included information about two-operand instructions)

  • SNaN and QNaN
    • x87 FPU — QNaN source operand.
    • SSE — First source operand, converted to a QNaN.
  • Two SNaNs
    • x87 FPU — SNaN source operand with the larger significand, converted to a QNaN
    • SSE — First source operand, converted to a QNaN.
  • Two QNaNs
    • x87 FPU — QNaN source operand with the larger significand
    • SSE — First source operand
  • NaN and a floating-point value
    • x87/SSE — NaN source operand, converted to a QNaN.

SSE floating point machine instructions generally have the form op xmm1, xmm2/m32, where the first operand is the destination register and the second operand is either a register or a memory location. The instruction will then do, in effect, xmm1 <- xmm1 (op) xmm2/m32, so the first operand is both the left-hand operand of the operation and the destination. That's the meaningof "first operand" in the above chart. AVX adds three-operand instructions, where the destination might be a different register; it is then the third operand and does not figure in the above chart. The x87 FPU uses a stack-based architecture, where the top of the stack is always one of the operands and the result replaces either the top of the stack or the other operand; in the above chart, it will be noted that the rules do not attempt to decide which operand is "first", relying instead on a simple comparison.

Now, suppose we're generating code for an SSE machine, and we have to handle the C statement:

a = b + c;

where none of those variables are in a register. That means we might emit code something like this: (I'm not using real instructions here, but the principle is the same)

LOAD  r1, b  (r1 <- b)
ADD   r1, c  (r1 <- r1 + c)
STORE r1, a  (a  <- r1)

But we could also do this, with (almost) the same result:

LOAD  r1, c  (r1 <- c)
ADD   r1, b  (r1 <- r1 + b)
STORE r1, a  (a  <- r1)

That will have precisely the same effect, except for additions involving NaNs (and only when using SSE). Since arithmetic involving NaNs is unspecified by the C standard, there is no reason why the compiler should care which of these two options it chooses. In particular, if r1 happened to already have the value c in it, the compiler would probably choose the second option, since it saves a load instruction. (And who is going to complain? We all want the compiler to generate code which runs as quickly as possible, no?)

So, in short, the order of the operands of the ADD instruction will vary with the intricate details of how the compiler chooses to optimize the code, and the particular state of the registers at the moment in which the addition operator is being emitted. It is possible that this will be effected by the use of a union, but it is equally or more likely that it has to do with the fact that in your code using the union, the values being added are arguments to the function and therefore are already placed in registers.

Indeed, different versions of gcc, and different optimization settings, produce different results for your code. And forcing the compiler to emit x87 FPU instructions produces yet different results, because the hardware operates according to a different logic.


Note:

If you want some bedtime reading, you can download the entire Intel SDM (currently 4,684 pages / 23.3MB, but it keeps on getting bigger) from their site.

Community
  • 1
  • 1
rici
  • 234,347
  • 28
  • 237
  • 341
  • Thanks for the detailed answer! My head's swimming a bit, but I think I follow. (Though I might take a pass on reading through Intel's operations manual) – Cappielung Feb 16 '17 at 22:31