If I have a float that f = 50,000, and then i do f*f, is the value returned a negative?

Question

So, It's almost time for midterms and the professor gave us some sample questions.

What I THINK the answer is:

We are given a float that is f=50000.

if we do f*f we get 2,500,000,000.

Now, I'm assuming we're working with a 32 bit machine as that is what we have studied so far. So, if that Is the case then 2,500,000,000 32 bit float not being declared unsigned is considered signed by default. Since 2,500,000,000 is a little over half of the 32 bit representation of 4294967296, and it is signed, we would have a negative value returned, so the statement f * f < 0 would be true, right?

I've only been studying systems programming for 4 weeks, PLEASE correct me if I am wrong here.

What do you think is the difference between `float` and `int`? — Mooing Duck, Oct 17 '18 at 17:43
I still have to re-read floating points today, but if my memory isn't failing me then it's either the number of bytes used, or the signedness of float compared to int? — The_Senate, Oct 17 '18 at 17:45
Well, `float` and `int` use the same number of bytes, and have "the same signedness", so... no. It sounds like you have no idea what `float` is, so I think that might be the real question here. — Mooing Duck, Oct 17 '18 at 17:46
I suppose that is true, thank you for your time. However, I have one more question for you. Say we have the SAME thing, but our variable is an integer called I? Would my thesis be correct? — The_Senate, Oct 17 '18 at 17:48
Signed integer overflow has undefined behavior. Unsigned integer overflow wraps around. (Signed integer overflow *typically* wraps around, but don't depend on that; compiler optimizations can do surprising things.) — Keith Thompson, Oct 17 '18 at 17:53
@KeithThompson Right, but suppose its not overflowing and its just over the halfway mark? As long as we don't yield a number larger than twice our high order bits, then we shouldn't get the accursed signed overflow, right? Unsigned integer overflow would just do a modulo to reduce the values based on what I have read. Like if we did x >> 33 it would just do x>> 1 on a 32 bit machine, yes? — The_Senate, Oct 17 '18 at 17:58
https://stackoverflow.com/questions/10108053/ranges-of-floating-point-datatype-in-c — pm100, Oct 17 '18 at 18:00
and http://www.ntu.edu.sg/home/ehchua/programming/java/datarepresentation.html — pm100, Oct 17 '18 at 18:02
https://floating-point-gui.de/formats/fp/ is a nice, easy to understand description of how binary floating-point works. — Daniel Pryden, Oct 17 '18 at 18:03
@The_Senate: For type `int`, any operation whose result is greater than `INT_MAX` or less than `INT_MIN` is an overflow, and has undefined behavior. A shift whose right operand is greater than or equal to the width of the (promoted) left operand, or is negative, has undefined behavior. If `int` is 32 bits, `1>>33` has undefined behavior. — Keith Thompson, Oct 17 '18 at 18:18
@KeithThompson are you absolutely sure about that? I ask because in this systems programming book, the C language when doing something like x >> 34 or x >> 40, the values end up modulo-ing and reducing to a value based on 32 bit (This is on 32 bit machines, and this is the book's words, not mine.) — The_Senate, Oct 17 '18 at 18:30
@The_Senate Yes. From section 6.5.7p3 of the [C standard](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf) regarding bitwise shift operators: *"The integer promotions are performed on each of the operands. The type of the result is that of the promoted left operand. **If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined.** "* — dbush, Oct 17 '18 at 18:37
@dbush If you're correct, then I hope the writers of my book fixed that in the third edition. — The_Senate, Oct 17 '18 at 18:48
@The_Senate Yes, I'm absolutely sure. The behavior is defined by the ISO C standard. You can read the [N1570](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf) draft. Bitwise shift operators are covered in section 6.5.7. "Undefined behavior" means that the C standard places no requirements on the behavior. Anything that any implementation does (including the behavior you describe) is consistent with that. If your book describes what one implementation does, it may be correct. What's the book? — Keith Thompson, Oct 17 '18 at 19:09
@SteveSummit Steve, I don't believe in appeal to authority by grace of Occam's Razor. I question everyone and everything, period. I encourage you to do the same. — The_Senate, Oct 17 '18 at 19:28
@KeithThompson Computer Systems a programmers perspective, 2nd edition. — The_Senate, Oct 17 '18 at 19:29
@SteveSummit: I see the described modulo behavior using gcc on Linux x86_64. For example, `1U << 33 == 2`. (`int` is 32 bits.) This is, of course, one of the infinitely many possible results of undefined behavior. — Keith Thompson, Oct 17 '18 at 19:39

dbush · Answer 1 · 2018-10-17T18:29:06.080

Unlike the int type, which is typically represented as a two's complement number, a float is a floating point type, which means it stores values using a mantissa and an exponent. This means that the typical wrapping behavior seen with signed integer types doesn't apply to floating point types.

In the case of 2,500,000,000, this will actually get stored as 0x1.2A05F2 x 2³¹.

Floating point types are typically stored using IEEE 754 floating point format. In the case of a single precision floating point (which a float typically is), it has 1 sign bit, 8 exponent bits, and 24 mantissa bits (with 23 bits stored, as the high order "1" bit is implied).

While this format can't "wrap" from positive to negative, it is subject to 2 things:

Loss of precision
Overflow of the exponent

As an example of precision loss, let's use a decimal floating point format with a 3 digit mantissa and a 2 digit exponent. If we multiply 2.34 x 10¹⁰ by 6.78 x 10¹⁰, you get 1.58652 x 10²¹, but because of the 3 digit precision it gets truncated to 1.58 x 10²¹. So we lose the least significant digits.

To illustrate exponent overflow, suppose we were to multiply 2.00 x 10⁶⁰ by 3.00 x 10⁵⁰. You'd get 6.00 x 10¹¹⁰. But because the maximum value of an exponent is 99, this is an overflow. IEEE 754 has a special notation for infinity which it uses in the case of overflow where it sets the mantissa to all 0 bits and the exponent to all 1 bits, and the sign bit can be used to distinguish positive infinity and negative infinity.

So, floating points are always greater than 0 and don't "wrap" so to speak, can they be reduced in any way? What keeps floats in check if they just keep growing exponentially? — The_Senate, Oct 17 '18 at 18:00
@The_Senate: You need to read up on the IEEE-754 floating point standard. But the short answer is that floats can be negative, and an "overflowed" float becomes infinity. — Daniel Pryden, Oct 17 '18 at 18:01
@The_Senate No, they're not always greater than 0. There's a separate sign bit for that. — dbush, Oct 17 '18 at 18:02
Hmm Interesting choice of 50,000*50,000 as that is near `float` precision limits. — chux - Reinstate Monica, Oct 17 '18 at 18:16

If I have a float that f = 50,000, and then i do f*f, is the value returned a negative?

1 Answers1