Given an `int A` Is there a strong guarantee that `A == (int) (double) A`?

Question

I need a strong guarantee that int x = (int) std::round(y) will always give the correct results (y is finite and "humanly", e.g. -50000 to 50000).

std::round(4.1) can give 4.000000000001 or 3.99999999999. In the latter case, casting to int gives 3, right?

To manage this, I reinvented the wheel with this ugly function:

template<std::integral S = int, std::floating_point T>
S roundi(T x)
{
    S r = (S) x;
    T r2 = std::fmod(x, 1);
    if (r2 >= 0.5) return r + 1;
    if (r2 <= -0.5) return r - 1;
    return r;
}

But is this necessary? Or does casting from double to int use the last mantissa bit for rounding?

`std::round(4.1) can give 4.000000000001 or 3.99999999999` — no it absolutely cannot?! — Konrad Rudolph, Jun 15 '22 at 09:15
What do you mean by “strong guarantee”? The standard doesn’t guarantee that all integers are exactly representable by doubles, so from that perspective the answer is “no”. However, in practice, with 32 bit integers and 64 bit IEEE-754 doubles, there’s no issue. — Konrad Rudolph, Jun 15 '22 at 09:16
@Chameleon That’s not what I’m saying. What I’m saying is that this rounding does not introduce floating point errors. And even if it did, the result would *still* always be ≥4. It would never be 3.x. — Konrad Rudolph, Jun 15 '22 at 09:18
@KonradRudolph So if produced double is equal or slightly larger (as absolute value) than corrent result, there is a strong guarantee. — Chameleon, Jun 15 '22 at 09:20
You may be interested by [this answer on the same topic](https://stackoverflow.com/a/47153373/11455384). If `std::round` were to introduce floating point errors, it would not be really meaningful to have such a function :) — Fareanor, Jun 15 '22 at 09:20
Perhaps [`std::lrint`](https://en.cppreference.com/w/cpp/numeric/math/rint) is better suited for this than `std::round` if you want an integer type after rounding? — Ted Lyngmo, Jun 15 '22 at 09:21
All integral values -50000 to 50000 can be exactly represented by a floating point type with a mantissa more than (about) 16 bits. I'm not aware of any real-world representation of a `double` with a mantissa of less than 40 bits so you can expect a `double` to easily represent your range of integral values. You might be pushing your luck a bit if you use `float` instead of `double` though. For discussion specific to IEEE floating point, have a look at https://stackoverflow.com/questions/3793838/which-is-the-first-integer-that-an-ieee-754-float-is-incapable-of-representing-e — Peter, Jun 15 '22 at 09:24
std::round(4.1) returns an integer value (note I say value) and neither 4.000000000001 or 3.99999999999 are integer values. The literal 4.1 in the source file will be converted to its binary representation when compiled into object code and as this is implementation defined it could be either of these but round still rounds to the nearest integer value so the result of std::round(4.1) will be 4.0 on any platform. std::lround does not exist to 'remove confusion' as the documentation for std::round already does that. std::lround hides the cast and *might* add range checking so is preferred. — Trevor, Jun 15 '22 at 11:15
What does "humanly" mean here? What is the significance of -50k to 50k in those terms? — TylerH, Jul 12 '22 at 18:11

Aykhan Hagverdili · Accepted Answer · 2022-06-15T13:34:06.003

6

Assuming int is 32 bits wide and double is 64 bits wide (and assuming IEEE 754), all values of int are exactly representable in a double.

That means std::round(4.1) returns exactly 4. Nothing more nothing less. And casting that number to int is always 4 exactly.

edited Jun 15 '22 at 13:34

answered Jun 15 '22 at 09:25

Aykhan Hagverdili

28,141
6
41
93

score 3 · Answer 2 · answered Jun 15 '22 at 11:01

3

std::round(4.1) can give 4.000000000001 or 3.99999999999. In later case, casting to int gives 3 right?

No, it cannot. The result of std::round is always an integer, exactly, with no rounding error.

I need strong guarantee that int x = (int) std::round(y) will give always the correct results (y is finite and "humanly" e.g. -50000 to 50000).

C++ inherits its floating-point model from C, and, per C 2018 5.2.4.2.2 12, double is capable of representing at least ten-digit integers, so [−50,000, +50,000] is well within its range. It is even within the range of float, which is capable of representing six-digit integers. This requirement extends back to C 1990.

Given an int A Is there a strong guarantee that A == (int) (double) A?

No, the C++ standard does not impose an upper limit on the width of int nor a relationship between with precision of int (number of bits it uses for the value, excluding the sign bit) and the precision of double (number of bits or other digits in its significand), so a C++ implementation may have an int with more precision than double.

answered Jun 15 '22 at 11:01

Eric Postpischil

195,579
13
168
312

I would add that the rounding error, if it occurs, occurs during compilation and not during the call to std::round. Also, given that the standard does not define the range of double you are correct that there is no strong guarantee, but given that he states the integer has the range -50000 to 50000 there will not be any practical system where this cannot be guaranteed, so it isn't strong but as close to strong as you will ever likely see. – Trevor Jun 15 '22 at 11:29
@Trevor: For an integer variable `y`, the rounding to `double` to pass it to `std::round` cannot occur during compilation, except in cases where the compiler can trace its value to determine it at compilation time. The C++ standard does set minimum requirements on the properties of a `double`, by inheritance from C, as I stated, and these require a precision sufficient to represent all ten-digit integers with no error. In particular, the C++ `` header provides the facilities of the C header ``, for which `DBL_DIG` must be at least ten. – Eric Postpischil Jun 15 '22 at 11:46
@Trevor, to round a number to integer there's no rounding error, introduced by the rounding operation. It is an exact operation. – Luis Colorado Jun 15 '22 at 18:47
@EricPostpischil, I was talking about 'std::round(4.1)' and not 'std::round(y)', where y is an integer. Obviously if y is an integer then there cannot be any rounding error on conversion to double because integers are by definition already 'round' and can be exactly represented in binary. The rounding error I am talking about occurs when the *literal* 4.1 is converted into its binary representation in order to be stored in the compiled object file. Equally obviously, in such cases the compiler can trace its value as that is what *literal* literally means. – Trevor Jun 17 '22 at 11:24
@LuisColorado, I was a bit loose with the language. By 'rounding error' I really mean base conversion error i.e. the error that occurs when converted a base 10 floating point number to a binary floating point number due to the fact that not all terminating decimal numbers have a terminating binary representation. This is the error that occurs during compilation. The real rounding error occurs when the binary representation is passed to std::round. Base conversion error is the difference between 4.1 and its nearest binary representation, rounding error is the difference between this and 4.0. – Trevor Jun 17 '22 at 11:40
@Trevor: There is no error when the `4.1`, having been converted to `double`, is passed to `std::round`. `std::round` receives exactly the value passed to it. There is no rounding error in computing the function, because `std::round` is defined to return the result of rounding its argument to an integer, so 4 is the correct result of the function, not an error. – Eric Postpischil Jun 17 '22 at 12:36
@Trevor: Also, conversion between bases is a floating-point function and is specified in IEEE 754, so the error that occurs in converting `4.1` to `double` is a rounding error just as in arithmetic operations, although it also may be more specifically called a base-conversion error. (Additionally, the cause of the error is not specifically that the binary representation is not terminating but that it does not fit in the destination, as can be seen by the fact that numbers with binary representations that terminate but do not fit will still incur rounding errors.) – Eric Postpischil Jun 17 '22 at 12:39
@Trevor, as Eric tells you, there's no rounding error in the rounding operation from `4.1`. The main problem you can have is that 4.1 itself is not exactly representable as a binary floating point number, so there will be a small difference between the actual binary value, and the two surrounding exact rational numbers representable in binary. – Luis Colorado Jun 24 '22 at 11:39

Luis Colorado · Answer 3 · 2022-06-15T10:43:12.490

0

std::round(4.1) can give 4.000000000001 or 3.99999999999. In later case, casting to int gives 3 right?

That's true. 4.1 can be seen as 4.0 (which has exact representation in floating point as an integer it is) plus 0.1, which can be seen as 1/10 (it's exactly 1/10, indeed) And the problem you will have is if you try to round a number close to that to one decimal point after the decimal mark (rounding to an integer multiple of 0.1 or 0.01 or 0.001, etc.)

If you are using decimal floating point (which normally C compilers don't) then you are lucky, as 0.1 is 10&^(-1) which again has an exact representation in the machine. But as a binary floating point number, it has an infinite representation in binary as 0.000110011001100110011001100...b and it depends where you cut the number you will get some value or another, but you will never get the exact value as a decimal number (with a finite number of digits)

But the way round() works is not that... if first adds 0.5 (which is exactly representable as a binary floating point number) to the number (this results in an exact operation, no rounding error emerges from it), and then cuts the integer part (which is also an exact operation), meaning that you are getting always an exact integer result (which is perfectly representable as an exact floating point, if the original number was). The rounding is equivalent to this set of operations:

(int)(4.1 + 0.5);

so you will get the integer part of 4.6 after addding the 0.5 part (or something like 4.60000000000000003, 4.59999999999999998, anyway both will be truncated to 4.0, which is also exactly representable in binary floating point format) so you will never get a wrong answer for the rounding to integer case... you can get a wrong response in case you get something close to 4.5 (which can round to 4.0 instead of the correct rounding to 5.0, but .5 happens to be exactly 0.1b in binary... and so it's not affected --

Beware although that rounding to multiples of a negative power of ten (0.1, 0.01, ...) is not warranted, as none of those numbers is representable exactly in binary floating point. All of them have an infinite representation as binary numbers, and due to the cutting at some point, they can be represented as a tiny number above or below (depending on which is close) and the rounding will not work.

edited Jun 15 '22 at 10:43

answered Jun 15 '22 at 09:48

Luis Colorado

10,974
1
16
31

2

You shouldn't start your answer by agreeing with a something that isn't true. I get what you are trying to say but the statement 'std::round(4.1) can give 4.000000000001 or 3.99999999999' is patently false. You obvioulsy meant to agree with the statement that '4.1 can give 4.000000000001 or 3.99999999999' but these are two different statements. – Trevor Jun 15 '22 at 11:21
1

The C and C++ standards do not specify how `round` is implemented, the implementation imagined in this answer is defective, and the statement “no rounding error emerges from it [adding .5]” is false. Consider ½−2^−54, which is representable in the ubiquitous IEEE-754 binary64 format. `round` for this should return zero, but adding .5 produces one (because the real-number-arithmetic result, 1−2^−54 is not representable, so it is rounded to 1), and then truncating produces one. – Eric Postpischil Jun 15 '22 at 17:51
@EricPostpischil, I say _the rounding is equivalent_. I have not said about any specific implementation, and, as that equivalent implementation works, I feel confident that probably you have read my post too quickly. You say _is defective_, could you clarify why is it defective? This was the implementation used for a looong time until some guy ffound a flaw... Please, in the future, abstent to comment my posts.... as it seems you are pursuing me to find a flaw y my posts. I'm a C developer for more than 45 years now. – Luis Colorado Jun 15 '22 at 18:21
I can be in the line, but I don't write _defective_ code to confound anybody. The above snippet is pseudocode, not ANSI C or any other C variant. Analyse it with a K&R standard in mind. – Luis Colorado Jun 15 '22 at 18:26
By the way, I'm not talking aobut the IEEE-754, but about systems in which binary floating point representation is made, or decimal floating point (not so ubiquitous, as it is implemented in almos no processor at all, but it is also valid implementation) The rounding problems that make floating point are due to a mathematical property of the numbering basis, and has no relationship with the language standard used. – Luis Colorado Jun 15 '22 at 18:29
@Trevor, I start my answer citing a paragraph in the question. I cannot be made responsible of something said by others. Sorry. I have not said that rounding `4.1` to integer will give inexact results, the PO said that. I'm trying to follow the line of reasoning of tthe PO, although the example (and I agree with you) was bad selected. But suppose you wanted to round to the first digit after the decimal point `4.16`, to get `4.2`, and, instead you get `4.2000000000000003` (in some binary floating point, not necessary IEEE-754) This can happen, as in this case you divide by 10 in the... – Luis Colorado Jun 15 '22 at 18:31
... process, which is something that will give you an inexact answer. – Luis Colorado Jun 15 '22 at 18:35
1

(a) Re “You say is defective, could you clarify why is it defective?”: I presented an example for which it fails, ½−2^−54 in IEEE-754 binary64, and have tested it. – Eric Postpischil Jun 15 '22 at 19:50
(b) Re “I have not said about any specific implementation”: The answer presents an implementation: “[it] first adds 0.5 (which is exactly representable as a binary floating point number) to the number (this results in an exact operation, no rounding error emerges from it), and then cuts the integer part (which is also an exact operation), meaning that you are getting always an exact integer result (which is perfectly representable as an exact floating point, if the original number was).” That implementation is defective, as my example proves. – Eric Postpischil Jun 15 '22 at 19:50
(c) Re “This was the implementation used for a looong time until some guy ffound a flaw...”: The length of time the error was not found does not negate the fact the implementation is defective, as proven by the example. – Eric Postpischil Jun 15 '22 at 19:50
(d) Re “Please, in the future, abstent to comment my posts...”: No, incorrect answers deserve comment and correction, and there is no reason you should hold yourself immune from it. – Eric Postpischil Jun 15 '22 at 19:51
(e) Re “as it seems you are pursuing me to find a flaw y my posts”: This is false and there is no merit to it. I found the question in the common stream of recently modified questions and do not follow you. Further, this is an inappropriate forum for your imagined grievances. You may take them up with the moderators or the operators of the site, and they will confirm your complaint has no merit, or, quite likely, ignore it due to its lack of merit. – Eric Postpischil Jun 15 '22 at 19:53
(f) Re “I'm a C developer for more than 45 years now”: The length of time you have been a C developer does not negate the fact the implementation is defective, as proven by the example. Further, if length of experience were grounds for determine correctness or incorrectness of code, you would fail on that measure too, as I have been programming longer. – Eric Postpischil Jun 15 '22 at 19:54
(g) Re “By the way, I'm not talking aobut the IEEE-754, but about systems in which binary floating point representation is made, or decimal floating point”: The example using IEEE-754 is one possible example and proves the described implementation is defective. – Eric Postpischil Jun 15 '22 at 19:56
(f) you have not shown any example. you use an expression that is not acceptable neither as pseudocode due to evaluation ambiguities (you use `(1/2)-2^-54` in which two operators (`^`, that you don't define, if it is exclusive or bitwise or exponentiation and `-` that you don't define again if it is unary minus) in any case if, using C notation you are using the value `0.5 - 1/pow(2.0, 54)` value, this number (which you could have written as `0.5 * (1.0 - DBL_EPSILON)` is mentioned in the answer as an error that incorrectly rounds to 0, but not due to inexact calculations, but due to the ... – Luis Colorado Jun 16 '22 at 05:48
... rounding error of a previous calculation, that results in producint the exact value 0.5. You don't need to be so elaborate.... if you use `0.5 - 1e-300` you will get the same error, but written in a C readable and compatible way. – Luis Colorado Jun 16 '22 at 05:50
1

@LuisColorado: “½−2^−54” uses common notation. It is one half minus two to the power of negative 54. You can write C code for that value as `.5 - 0x1p-54` or `0x1.fffffffffffffp-2`. It cannot be written as `0.5 * (1.0 - DBL_EPSILON)`, as that produces a different value, which could be written as `0x1.ffffffffffffep-2`. There is no “rounding error of a previous calculation,“ and the expression does not “results in [producing] the exact value 0.5”. As I stated initially, ½−2^−54 is representable in IEEE-754 binary64; its value is that of `0x1.fffffffffffffp-2`, not `0.5`. – Eric Postpischil Jun 16 '22 at 09:58
The expression you use is wrong as it is out of the range for a significand of a double value, that has only 51bits. So the result of that expression is exactly 0.5, and rounding it gives 1.0. Please, next time use a better example. – Luis Colorado Jun 17 '22 at 04:44
@LuisColorado: The most significant bit of ½−2^−54 is 2^−2 (because ½−2^−54 is slightly less than ½, so the position value of its first bit is ¼), and the least significant bit in it is 2^−54. 2^−2 to 2^−54 spans 53 bits, which is the width of the mathematical significand of an IEEE-754 binary64. Furthermore, as I informed you, I tested this. The value can be written as `0x1.fffffffffffffp-2`, and executing `#include ` / `int main(void) { double x = 0x1.fffffffffffffp-2; printf("%a\n", x); }` in an implementation using binary64 for `double` prints “0x1.fffffffffffffp-2”. – Eric Postpischil Jun 17 '22 at 10:39
@LuisColorado: For the full test, executing `#include ` / `#include ` / `double BadRound(double x) { return trunc(x + .5); } int main(void) { double x = 0x1.fffffffffffffp-2; printf("%g\n", round(x)); printf("%g\n", BadRound(x)); }` prints “0” and “1”. – Eric Postpischil Jun 17 '22 at 10:41
1

@LuisColorado: The width of the IEEE-754 significand, called its precision, is 53 bits, not 51, per IEEE 754-2019 table 3.2 in clause 3.3 and table 3.5 in clause 3.6. – Eric Postpischil Jun 17 '22 at 10:44
You are not wrong to cite the paragraph and nor are you responsible for its correctness, but you are wrong to agree with it. After citing the paragraph you include the words 'That's true'. It is these words that I take issue with and not the citation that preceeds them. – Trevor Jun 17 '22 at 11:10
@EricPostpischil, you are using the document... .I have no access to it.... thanks for the correction, but for that round to work you need to extend temporarily (by one exact bit) the precision to allow for the last (less significative) bit to be used in the calculation, which requires bit manipulation and not the use of plain C arithmetic operations. Ideally, the expression I used is mathematically correct, and you know about one single case (well, one single case per power of the base numeration system) that the rounding is made in a wrong way (as I already said in my response) You win – Luis Colorado Jun 24 '22 at 12:22

Given an `int A` Is there a strong guarantee that `A == (int) (double) A`?

3 Answers3