Issue with Floating points representation

Question

Try the following code :

#include <stdio.h>

unsigned char TEST_COMPILER_AS13 = 1.18 * 100;

int main(void) {
    printf("%d" , TEST_COMPILER_AS13);
    // your code goes here
    return 0;
}

using C https://www.codechef.com/ide

the result would be 117

if we replaced 1.18 with 1.17 the result would be 116 , yet if we used 1.19,1.20,1.15 etc the result would be correct 119 , 120 , 115.

using different online compiler say :http://codepad.org/ the results for 1.18 and 1.17 would be okey yet then if you tried 1.13 , 1.14 , 1.15 it would 112 , 113 , 114 respectively

I'm rubbing my head and i cant understand why this happen. Note: I have tried different compilers DIAB, COSMIC , MinGw.. etc all have similar issue. so what im missing here or how the floating point operations are done.

Note: to solve this issue you could just cast the operation with float so the declaration would be as follow

unsigned char TEST_COMPILER_AS13 = (float) (1.18 * 100);

I'm open to your answers , i really want to understand how this works. why it works for some number and others wont, Why compiler differs on the way to handle it , is there compiler options which would affect the compiler behaviour

[What Every Computer Scientist Should Know About Floating-Point Arithmetic](http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html) — 001, Dec 09 '15 at 19:47
Step 1, the `double` representations might not be exact. Step 2, you have your variable types all mixed up, between `unsigned char` and `int` and `double`. New question. Why does my ladder fall over when I use the wrong side? — Weather Vane, Dec 09 '15 at 19:50
@WeatherVane the question here is why it fails ! i pretty much know how to solve it ! whats behind it. if 1.18 works did you try 1.13 , ! also the more specific question why compilers handle it in differnt ways and what the relation between 1.18 and 1.13 , 1.17 , 1.17 etc — Hossam Soffar, Dec 09 '15 at 19:53
Nope, I didn't bother, I would not even code like that. I thought I had explained why. — Weather Vane, Dec 09 '15 at 19:54
@WeatherVane : then that's make me wonder why you even cared to test it with MSVC , or follow the answer! if you dont know the answer i believe you should wait for some one with experience to answer rather than " i don't care " statement — Hossam Soffar, Dec 09 '15 at 19:57
If you knew how to fix it, why did you bother wasting our time asking about it? — Martin James, Dec 09 '15 at 19:59
@HossamSoffar because I took the trouble to make sure I know what your code is doing, before commenting, and that was even before the downvoted answer. And because it's crap code, that's why I don't test any other values. — Weather Vane, Dec 09 '15 at 19:59
on codechef.com it works correctly with "C++" and "C++14", does not work correctly with "C99 Strict". — Mike Nakis, Dec 09 '15 at 20:02
@MikeNakis : yes thats exactly my question. Why doesn't it work? — Hossam Soffar, Dec 09 '15 at 20:05
I don't know. I am just reporting my findings, so that people don't think you are dreaming things up. I am really annoyed by the downvotes and by people's dismissive attitude. — Mike Nakis, Dec 09 '15 at 20:06
What? Of course OP is not dreaming things up. The only surprise is that such results are a surprise. — Martin James, Dec 09 '15 at 20:07
@MartinJames : if you just know that this is wrong and you dont know why its wrong and you dont even bother to know , You'd make no difference than a dumb machine. — Hossam Soffar, Dec 09 '15 at 20:10
@HossamSoffar I'm not excusing my attitude, but it would be better focussing on what does work, and move on from techniques that do not. — Weather Vane, Dec 09 '15 at 20:11
@MartinJames if it is not surprising to you, then explain it. If you don't want to explain it, remain silent. Pythagoras bothered with inventing his theorem despite the fact that he could obviously count the squares. "Who cares" is very un-scientific, anti-understanding, and counter-productive. — Mike Nakis, Dec 09 '15 at 20:12
I don't need to know exactly, to bit level. I could find out, but that woud mean a HUGE AMOUNT OF WORK, disassembling and debugging each compiler output. That would be madness, pointless and, if anyone should do it, the OP should. I don't care because it's FP, assigned to unsigned int, printed with a '%d signed-int specifier. That's just not anything I would do, ever. Again - operations like this, who cares, (other than possibly OP's prof/TA)? — Martin James, Dec 09 '15 at 20:14
I mean, it's 117.999999999999999999whatever, truncated to 117. Who cares about the whatever? — Martin James, Dec 09 '15 at 20:23
Some compilers will chuck out 118.000000000000000000whatever instead, which gets truncated to 118. The whatever does not matter. — Martin James, Dec 09 '15 at 20:27
You are aware that, with common FP standards, some those numbers can not exist in a computer? Why do you continually ask for more information than you already have? With the most common FP formats, 1.18 and 1.22 do not exist, and can not exist, in your computer. How many times do you have to be told this? What more cold you possibly want? Those numbers do not exist. There is no accurate representation in the FP binary format most commonly used. It is not possible to represent the numbers exactly. Those numbers cannot be stored in the bits. They get truncated when assigned to uint. — Martin James, Dec 09 '15 at 21:24
Oh, BTW, those numbers cannot be represented in common FP formats. If you need to know more detail, use your debugger to step through the executables generated by you set of compilers. You will find that 1.18 etc. have no exact binary representation and, as a result, different compilers/libraries/FP hardware/whatever produce slightly different outputs that, when truncated, result in apparrently gross errors. It's just a well-known fact of computer life that many numbers cannot be represented in common FP formats. — Martin James, Dec 09 '15 at 21:31

dxiv · Answer 1 · 2015-12-09T20:38:14.060

-1

The way you have it, the floating point value is truncated to an integer type (unsigned char). To round to nearest, use this instead.

unsigned char TEST_COMPILER_AS13 = 1.18 * 100 + 0.5;

While this may "fix" it for this particular case, there will always be precision issues with floating point calculations. See the link posted by @JohnnyMopp in a comment.

[EDIT] To see what a particular compiler "thinks" 1.18 * 100 is, you may print it to the full precision (at most 17 significant decimal digits in the case of double).

printf("%.16f * 100 = %.16f\n", 1.18, 1.18 * 100);

edited Dec 09 '15 at 20:38

answered Dec 09 '15 at 19:55

dxiv

16,984
2
27
49

2

If you're going to recommend rounding, recommend `round()`, or `lround()`, or any of the other proper functions. Don't recommend broken `+0.5`. – EOF Dec 09 '15 at 19:57
@EOF In this context the values are small enough to fit into a byte once rounded. I did not recommend it as a general rounding solution, and neither is `+0.5` *always* broken. – dxiv Dec 09 '15 at 20:00
@dxiv i believe rounding is not an option here. im just looking for a why this happen and why compilers really handle it in different ways. – Hossam Soffar Dec 09 '15 at 20:11
@HossamSoffar It happens because 1.18 does not have an exact representation in the IEEE 754 floating point format. From the online calculator at http://www.binaryconvert.com/result_double.html?decimal=049046049056 the best approximation in 64-bit precision (long double) is `1.17999999999999993782751062099. – dxiv Dec 09 '15 at 20:24
@dxiv same goes for 1.22 it's Most accurate representation = 1.219999999999999973354647409E0 , Why this is correct why the other is not. – Hossam Soffar Dec 09 '15 at 20:26
@HossamSoffar Different compilers may handle floating point (slightly) differently. VC++ alone has 3 different options (/fp:precise, /fp:strict, /fp:fast). – dxiv Dec 09 '15 at 20:30
Corner case [In this context the values are small enough to fit into a byte once rounded.](http://stackoverflow.com/questions/34187562/issue-with-floating-points-representation#comment56122056_34187846) Adding 0.5 to round often fails a nuanced case: the FP value just below 0.5 as the sum, though less than 1.0, rounds to 1.0. Thus rounding incorrectly. IMO best to use `round()` when able, else use +0.5 (Like for global constants when `round()` cannot be called) and clearly document is limitations. – chux - Reinstate Monica Dec 09 '15 at 20:42
@chux You are right in general, but this is a narrow case. I maintain that if (a) the number has at most 3 significant decimal digits, and (b) it rounds to an integer value between 0 and 255, then adding `+0.5` and truncating will *always* match the `round()` function. That's all there is to it in this particular context. – dxiv Dec 09 '15 at 20:48
@dxiv this ad-hoc property is completely irrelevant here as of all the numbers involved in question and answer, all either are integers or have about 50 decimal digits. – Pascal Cuoq Dec 09 '15 at 22:16
Note having to cite the precise reasons why adding 0.5 is an acceptable substitute to proper rounding to the nearest integer is exactly why one should always call a standard function to round to the nearest integer. – Pascal Cuoq Dec 09 '15 at 22:20
@PascalCuoq The property is not so ad-hoc. Given that 0.5 *is* exactly representable in fp format, it can be *proved* that `+0.5` truncation does the correct rounding across a range far wider than the 0-255 discussed here. FWIW `+0.5` is the first thing C FAQ 14.6 mentions at http://c-faq.com/fp/round.html. Anyway, the main point of the answer was *why* to round, not *how* to round - that was only a minor side point, which somehow most comments here seem to have fixated on. – dxiv Dec 09 '15 at 22:57
You mean a range containing 0x1.fffffffffffffp-1? – Pascal Cuoq Dec 09 '15 at 23:15
@PascalCuoq Point taken. It's not a range per se, but a sub-set of numbers in a range with strictly less than the 52 significant binary digits that a `double` allows. (For context, my earlier comment was about at most 3 significant decimal digits i.e. about 10 bits.) You number uses up all 52 bits, so it doesn't qualify. In fact that number, 0.49999999999999994, is the smallest (positive) failure case for `+0.5` rounding, as elaborated in more detail at http://ericlippert.com/2013/05/16/spot-the-defect-rounding-part-two/. – dxiv Dec 10 '15 at 00:08

Issue with Floating points representation

1 Answers1

Linked