Why converting from float to double changes the value?

Question

I've been trying to find out the reason, but I couldn't. Can anybody help me?

Look at the following example.

float f = 125.32f;
System.out.println("value of f = " + f);
double d = (double) 125.32f; 
System.out.println("value of d = " + d);

This is the output:

value of f = 125.32
value of d = 125.31999969482422

Can you provide the specific example(s) that you are seeing this behavior in? It would be better to show a SSCCE: http://sscce.org/ — Shafik Yaghmour, Jul 06 '13 at 16:30
One word: **precision**. Technically ... the values aren't "changing" ;) — Brian Roach, Jul 06 '13 at 16:32
Afaik, that's impossible. Doubles have both more precision and more range, so you lose neither when casting in that direction. It may print differently, though. — harold, Jul 06 '13 at 16:32

score 17 · Answer 1 · edited Jul 22 '19 at 10:22

17

The value of a float does not change when converted to a double. There is a difference in the displayed numerals because more digits are required to distinguish a double value from its neighbors, which is required by the Java documentation. That is the documentation for toString, which is referred (through several links) from the documentation for println.

The exact value for 125.32f is 125.31999969482421875. The two neighboring float values are 125.3199920654296875 and 125.32000732421875. Observe that 125.32 is closer to 125.31999969482421875 than to either of the neighbors. Therefore, by displaying “125.32”, Java has displayed enough digits so that conversion back from the decimal numeral to float reproduces the value of the float passed to println.

The two neighboring double values of 125.31999969482421875 are 125.3199996948242045391452847979962825775146484375 and 125.3199996948242329608547152020037174224853515625.
Observe that 125.32 is closer to the latter neighbor than to the original value (125.31999969482421875). Therefore, printing “125.32” does not contain enough digits to distinguish the original value. Java must print more digits in order to ensure that a conversion from the displayed numeral back to double reproduces the value of the double passed to println.

edited Jul 22 '19 at 10:22

Sprint

119
11

answered Jul 07 '13 at 02:07

Eric Postpischil

195,579
13
168
312

in single-precision, the sequence of numbers exactly representable around that number are: `0x42faa3d6, 0x42faa3d7, 0x42faa3d8` (in hex notation, with one bit increments). Anything in-between gets rounded to the closest number. Same can be said for double-precision. btw correct me if I'm wrong but single-precision allows about 9 significant decimal digits precision, so all those extra digits you showed are dropped and ignored – Amro Jul 07 '13 at 04:17
2

@Amro: Extra digits are not dropped or ignored. The arithmetic performed on floating-point objects behaves as if they have exactly the full values, include the values I showed. That is because they do have those full values; the IEEE 754 specification states that they have exactly those values and no others. They are **not** used by the computer as decimal approximations with a few digits. The computer calculates with them in binary, and they have exactly the values specified. – Eric Postpischil Jul 07 '13 at 11:51
hmm I guess this detail always confused me, can you comment on this please: http://stackoverflow.com/questions/4227145/in-matlab-are-variables-really-double-precision-by-default ? – Amro Jul 07 '13 at 15:42
@Amro: The IEEE 754 standard defines what values are represented by floating-point objects. The standard does not define any notion of significant decimal digits for a binary floating-point format. There simply is no basis for stating that an IEEE-754 binary floating-point object has some number of significant decimal digits. The object has exactly one value that is precisely specified, and it carries no information about significance. It is an exact value. The standard does permit implementations to limit the number of digits they produce when converting the value to a decimal numeral… – Eric Postpischil Jul 08 '13 at 03:10
… This is a quality-of-implementation issue. In my view, an implementation that “gives up” after displaying some number of digits and produces zeroes instead of continuing the conversion is low quality. The floating-point object represents a specific number, and the conversion of the object to decimal ought to produce the exact value that is represented, if the user requested or permitted enough digits for that. (If the user requested fewer digits, then a correctly rounded result should be provided.) – Eric Postpischil Jul 08 '13 at 03:12
@Amro: An IEEE-754 binary floating-point number represents **exactly** a number (–1)**s • f • 2**e for some sign bit s (0 for + or 1 for -), some fraction or significand f (which has a specified number of bits), and some exponent e (which is in a certain interval). The number is not defined to correspond to any particular number of decimal digits. It is **exactly** the stated value, not the nearest decimal numeral of some number of digits. When somebody states that the format only represents some number of decimal digits, say 16 or 17, that is an approximation and a crude statement of math. – Eric Postpischil Jul 08 '13 at 03:15
@EricPostpischil: The IEEE standard specifies that the sum of two floating-point values should be the value which best represents the numerical sum of their nominal numerical values, and likewise for other operations, but that doesn't mean floating-point values "represent" exact numbers. I would posit that 2000000.1f represents not "the exact quantity 2000000.125", but rather "some numeric quantity which is probably within the range 2000000.0625 to 2000000.1875. If it really represented the former, then why... – supercat Aug 16 '13 at 21:01
...should compilers even accept the notation `2000000.1f` without producing a diagnostic saying it should be `2000000.125f`? I would posit that the whole reason people use `float` is because they would rather get something that's reasonably close to a numerically-correct answer quickly than use some vastly more complicated rational or symbolic-computation type which takes 1000 times as long to compute. – supercat Aug 16 '13 at 21:05
@supercat: The IEEE 754-2008 standard does say that a floating-point value represents one specific number. It does not say that a floating-point value represents an interval or some sort of approximation. See clause 3.3. Some people **use** the floating-point format as if it did approximate numbers, and this is the source of many common misconceptions on Stack Overflow. (Start with an incorrect assumption, get incorrect conclusions.) To make correct deductions about floating-point, one must recognize that each floating-point value represents a specific number and derive results from that. – Eric Postpischil Aug 16 '13 at 22:49
@supercat: Certainly it would be useful to some people to have a switch to request that the compiler warn when a numeral in source text translated to a different value. This would cause warnings in a good deal of existing code, so there would likely be resistance to making it the default. But it could be a good educational tool as well as a bug deterrent. – Eric Postpischil Aug 16 '13 at 22:52
@EricPostpischil: Except on systems where the range of whole numbers representable by a floating-point format exceeds the range representable by the largest integral type (e.g. on my old Turbo-87 Pascal compiler, `Real` could represent all whole numbers up to 2^64, but the largest integer type went up to 32767), floating-point types shouldn't be used in cases where one wants exact results. For awhile, the IEEE standard allowed floating-point calculations to be reproducible on different platforms, though compilers have tended back toward using loosy-goosy semantics with intermediate values. – supercat Aug 16 '13 at 23:11
@supercat: On representation: Whether floating-point arithmetic should be used when one wants exact results (which often requires considerable care) is a separate question from whether a floating-point value represents an exact number. The latter is true; this is simply the way the standard is written, and it is necessary to describe the arithmetic and rounding. Even though each intermediate result is rounded to a representable value, the final result is a single number. Again, please see IEEE 754-2008 clause 3.3. – Eric Postpischil Aug 16 '13 at 23:16
@supercat: On reproducibility: Nothing has changed in this regard; the IEEE 754 standard allows calculations to be reproducible on different platforms. There is even a portion of it that states what is necessary for reproducibility. Compiler support for that is a separate issue. And it is not relevant to this question or this answer. – Eric Postpischil Aug 16 '13 at 23:18
@EricPostpischil: Given a 32-bit `float` whose nominal value is exactly 2000000.125, it's possible that it represents the result of a calculation which--if computed precisely--would equal exactly that number, but it's just about as likely to represent the result of a calculation which would--if done precisely--equal 2000000.115 or 2000000.135. By contrast, a `double` whose nominal value was 2000000.125 would be much more likely to have resulted from a calculation whose exact value was 2000000.125 than one whose value should was 0.01 larger or smaller. – supercat Aug 16 '13 at 23:18
@supercat: That is an abuse of terminology. As specified in IEEE 754-2008, floating-point values do not represent results of operations. They represent numbers (or NaNs). Twisting the language to use the word “represent” in “represent results of a calculation” is unhelpful. You may obtain approximate results by using floating-point arithmetic, but the values still represent specific numbers. If you do not use this fact, you cannot write correct proofs about floating-point arithmetic. Again, please see IEEE 754-2008 clause 3.3. – Eric Postpischil Aug 16 '13 at 23:21
@EricPostpischil: Perhaps the confusion has to do with "represent". Suppose I say `Resistance = scaleFactor/log(v1/v2);`. It it more helpful to say that value represents the combination of bits that result from performing the indicated operations with all the appropriate indicated rounding, or is it more helpful to say that it represents the measured resistance? If one is trying to define what operations should be performed on a bunch of bits, the former definition is more useful, but if one is working with numbers, the latter is more helpful. – supercat Aug 16 '13 at 23:32
Please see IEEE 754-2008 clause 3.3. – Eric Postpischil Aug 16 '13 at 23:52
@EricPostpischil: I don't have the standards document available, so I can't refer to the part you cite. Most programmers who use float types only need to perform a simple calculation and get a result accurate to within a few parts per million. To avoid losing a full bit of precision at each calculation step it may be necessary to analyze floating-point behavior in detail, but in many real-world usage cases the number of bits in precision in the type minus the number of accurate bits required in the result will exceed the number of steps. Given a choice of regarding numbers in precise terms... – supercat Nov 26 '13 at 20:21
...and after a few hours' calculation determining the value one is going to display to three significant figures would have a floating-point induced uncertainty of no more than 57.3 parts per billion, or regarding numbers in simplistic terms and determining that the value will be within 100 parts per million, which approach is more useful? If only three significant figures of the value will be displayed, is the more precise estimate of the possible uncertainty in any way useful? – supercat Nov 26 '13 at 20:26
1

@supercat: I do not comment here on whether any particular person ought to use floating-point in any particular way. I simply state that the IEEE 754 standard specifies quite clearly that floating-point value, other than a NaN, represents exactly one value, and the standard specifies that value exactly. Whether you think that is good or bad is of no consequence to the fact that the standard says it, and that results of floating-point operations are defined in terms of these values. Since implementations of IEEE 754 follow these rules, analysis of their behaviors must follow these rules. – Eric Postpischil Nov 26 '13 at 20:55

Pascal Cuoq · Answer 2 · 2013-07-07T02:14:58.970

12

When you convert a float into a double, there is no loss of information. Every float can be represented exactly as a double.
On the other hand, neither decimal representation printed by System.out.println is the exact value for the number. An exact decimal representation could require up to about 760 decimal digits. Instead, System.out.println prints exactly the number of decimal digits that allow to parse the decimal representation back into the original float or double. There are more doubles, so when printing one, System.out.println needs to print more digits before the representation becomes unambiguous.

edited Jul 07 '13 at 02:14

answered Jul 07 '13 at 01:19

Pascal Cuoq

79,187
7
161
281

There isn't really any `double` equivalent to the best `float` representation of 3.5E+38. Comparing such a value to the best float representation of any other value above 3.5E+38 will indicate the values are indistinguishable--not necessarily very informative, but correct. Converting that value to `double`, on the other hand, will cause it to erroneously compare *greater* than the best `double` representation of all values below 1.798E+308--an error of hundreds of orders of magnitude. – supercat Sep 04 '13 at 08:02
1

@supercat This answer is about converting a `float` to a `double`. The “best float representation of 3.5E+38” is `+inf` and the float `+inf` converts to the double `+inf` without loss of precision (it's the same `inf`!). How you interpret that infinity is your problem, not the problem of the conversion. A floating-point value (here, `+inf`) represents one value only (here, infinity). You could have made the same argument with 1-ulp intervals around `1.0f` and the double `1.0`, and the argument would be similarly irrelevant. It is a `float`, i.e. a single value, that is converted to `double`. – Pascal Cuoq Sep 04 '13 at 08:21
@supercat See “Some common misconceptions (2)” in http://lipforge.ens-lyon.fr/www/crlibm/documents/cern.pdf . De Dinerchin is only talking about finite floating-point values there but the same applies to `inf`. The floating-point value `inf` is **not** “a range of values comprising `3.5E+38`”, it is a single value, infinity. The approximation in translating `3.5E38` to `+inf` has already happened before the conversion to `double`, and does not prevent the conversion to `double` from being exact. – Pascal Cuoq Sep 04 '13 at 08:25
A floating-point value effectively encapsulates two concepts: what can be said about the computations that produced it, and what should be fed into computations going forward. Given an expression like `float2=float1/0.625f`, if `float1` was 62.5f, then the value of `float2` indicates that the arithmetic result of the last operation was between 13421772.5/134217728 and 13421773.5/134217728, and that future uses of `float2` will use the exact value 13421773/134217728. If `float1` was 3.4E38, the value of `float2` will indicate that... – supercat Sep 04 '13 at 15:10
...an arithmetic result exceeded 3.4028E38 by an unknown amount, and that future uses of `float2` will regard it as infinite. If one wanted to know whether the arithmetic result of a computations that yielded `float2` could be definitively regarded as larger than 0.11 or 1.7E+308, casting the second operand of each comparison to `float` would correctly report that they can't. Casting the `float2` to `double` would suggest that they could. – supercat Sep 04 '13 at 15:24
I should also note that in many languages, using default settings, an expression like `float f=16777216f/10f - 1677721f` could legitimately yield either 5/8 or 5033165/8388608, and perhaps other values as well. Such semantics imply to me that the implementers of the *language* don't regard `16777216f/10f` as being precisely 1677721.625, but would regard anything that's closer to 1677721.6 as equally valid. – supercat Sep 04 '13 at 15:42
@supercat In Java (assuming the early version or the `strictfp` keyword) and in a standard-compliant C99 compiler that implements IEEE 754 and defines FLT_EVAL_METHOD as 0, `16777216.f/10.f - 1677721.f` unambiguously evaluates to the mathematical fraction 5/8. There is no doubt that the designers of other languages, and the implementers of otherwise fine C compilers, do not care about the minutia of floating-point, but that is no excuse to ignore it. – Pascal Cuoq Sep 04 '13 at 16:19
If one specifies strict semantics, the languages indeed work as you say, but a lot of code is compiled with loose semantics; I see no basis for considering floating-point values as regarding anything other than approximate quantities when loose semantics are enabled (as they often are in practice). If the extra speed enables approximate semantics *satisfy the programmer's requirements* better than would precise semantics, I wouldn't call them "defective". – supercat Sep 04 '13 at 16:42
@supercat I think you read too much into it. The fact that according to the C standard, `int x = 38000;` may not set `x` to `38000` does not mean that the C committee believes that `38000` represents several values. Similarly, C does not specify that `f=16777216f/10f - 1677721f` should set `f` to a definite value because of portability considerations, but that does not mean that a floating-point number does not represent a specific value. – Pascal Cuoq Jan 09 '14 at 15:44
Setting an `int` to 38000 will either set it to exactly 38000 or result in Undefined Behavior. If a compiler is allowed to arbitrarily evaluate the expression `16777216f/10f-1677721f` as being equal to 5/8 or 5033165/8388608, and is not required to be consistent, in what sense can that expression be regarded as having a "specific value"? Further, I would posit that when a programmer declares `float`, the programmer may be intending one of three things [BTW, I would think languages could be improved considerably if they had different types to express these meanings]: – supercat Jan 09 '14 at 16:35
1. I would like all computations with this variable to be performed precisely in accordance with IEEE semantics for 32-bit floating-point numbers; 2. Ideally, the computations would be as precise as possible, but I want them fast, and am willing to settle for 32-bit precision; 3. Ideally, the computations would be as precise as possible, but I can only afford 32 bits of storage per number and I'm willing to settle for 32-bit precision. All three types would be stored identically, but should have different rules regarding implicit casts and promotion. – supercat Jan 09 '14 at 16:39
In case #1, I would agree that there's no doubt but that a 32-bit variable would represent a precise quantity. In cases #2 and #3, however, I would posit that that `f` would represent a not-very-precise effort at computing 6/10. Careful use of IEEE semantics can yield better results than a "hope for the best" #2 or #3, but if one needs a result accurate to 10%, simple code using "sloppy" math which yields a result that's accurate to 0.3% may be better than carefully-constructed but complex code which uses IEEE semantics to obtain results accurate to 0.002%. – supercat Jan 09 '14 at 16:48
“Setting an int to 38000 will either set it to exactly 38000 or result in Undefined Behavior” No. – Pascal Cuoq Jan 09 '14 at 17:12
Checking the standard, I guess it's "implementation-defined", though the assignment is not required to complete (a signal may be raised), though that would imply that any implementation must either specify a particular value or specify that a signal be raised. I'm curious as to whether there's any advantage to having it be implementation defined; on some processors with 32-bit registers but 16-bit memory, where `int` is 16 bits, ensuring that `int i=38000L;` and `int f{return 38000L;}` yield the same value would require additional instructions. – supercat Jan 09 '14 at 17:36

score 4 · Answer 3 · edited Jun 20 '20 at 09:12

The conversion from float to double is a widening conversion, as specified by the JLS. A widening conversion is defined as an injective mapping of a smaller set into its superset. Therefore the number being represented does not change after a conversion from float to double.

More information regarding your updated question

In your update you added an example which is supposed to demonstrate that the number has changed. However, it only shows that the string representation of the number has changed, which indeed it has due to the additional precision acquired through the conversion to double. Note that your first output is just a rounding of the second output. As specified by Double.toString,

There must be at least one digit to represent the fractional part, and beyond that as many, but only as many, more digits as are needed to uniquely distinguish the argument value from adjacent values of type double.

Since the adjacent values in the type double are much closer than in float, more digits are needed to comply with that ruling.

Please read the original question, to which my original answer is a fully appropriate response, [here](http://stackoverflow.com/revisions/17504833/1). The question has been substantially updated since. — Marko Topolnik, Jun 04 '14 at 07:18

score 3 · Answer 4 · answered Jul 06 '13 at 17:49

The 32bit IEEE-754 floating point number closest to 125.32 is in fact 125.31999969482421875. Pretty close, but not quite there (that's because 0.32 is repeating in binary).

When you cast that to a double, it's the value 125.31999969482421875 that will be made into a double (125.32 is nowhere to be found at this point, the information that it should really end in .32 is completely lost) and of course can be represented exactly by a double. When you print that double, the print routine thinks it has more significant digits than it really has (but of course it can't know that), so it prints to 125.31999969482422, which is the shortest decimal that rounds to that exact double (and of all decimals of that length, it is the closest).

So, I made a mistake? What was it? – harold Jul 07 '13 at 07:28 — harold, Jul 07 '13 at 07:28

score 1 · Answer 5 · answered Jul 06 '13 at 17:14

1

The issue of the precision of floating-point numbers is really language-agnostic, so I'll be using MATLAB in my explanation.

The reason you see a difference is that certain numbers are not exactly representable in fixed number of bits. Take 0.1 for example:

>> format hex

>> double(0.1)
ans =
   3fb999999999999a

>> double(single(0.1))
ans =
   3fb99999a0000000

So the error in the approximation of 0.1 in single-precision gets bigger when you cast it as double-precision floating-point number. The result is different from its approximation if you started directly in double-precision.

>> double(single(0.1)) - double(0.1)
ans =
     1.490116113833651e-09

answered Jul 06 '13 at 17:14

Amro

123,847
25
243
454

these approximations can creep up on you in unexpected ways. For example `0.1*3 == 0.3` evaluates to `false`. If you need extra accuracy, use [arbitrary-precision](http://en.wikipedia.org/wiki/Arbitrary-precision_arithmetic) libraries – Amro Jul 06 '13 at 17:26
Really cogent explanation. Thanks. – Curt Jul 06 '13 at 18:13
@Curt: you can find a much better explanation here: http://www.mathworks.com/company/newsletters/news_notes/pdf/Fall96Cleve.pdf (by Cleve Moler, inventor of MATLAB). This [page](http://www.mathworks.com/help/matlab/matlab_prog/floating-point-numbers.html#f2-98690) also has some nice examples – Amro Jul 06 '13 at 18:31
This answer does not explain why printing a `float` shows a different numeral than printing a `double` converted from the same value (and which in fact has the same value, because conversion from `float` to `double` does not change the value). – Eric Postpischil Jul 07 '13 at 02:10
you are correct, I did not address how `println` chooses the number of decimal digits to display. The point I was making is that the mathematical value `123.32` cannot be exactly represented in single nor double precision, and that the approximation stored in `float` when upcasted to `double` type will not be the same as if the literal number was entered as `double` to begin with. So everybody is absolutely right in saying that float->double conversion does not loose any precision whatsoever, but the approximation we get for non-exactly-representable numbers are worse if you do the upcasting.. – Amro Jul 07 '13 at 03:51
@EricPostpischil: its like asking what is the best approximation you get for the fraction 1/3; in a 5 decimal digits system we get: 0.33333, in 10 digits we get: 0.3333333333. Now if we upcast the 5 digits representation to 10 digits, we can only add zeros, and thus we get a worse approximation: 0.3333300000 than the previous 10 digit one. That was my point :) The issue of how to display them is specific to the language used, which is why I showed the hexadecimal representation in my example above – Amro Jul 07 '13 at 03:56
@Amro: The phrasing “the approximation we get for non-exactly-representable numbers are worse if you do the upcasting” suggests that “upcasting” makes the approximation worse. That is false. A correct statement would be that a value converted to `float` and then converted to `double` generally has more error than a value directly converted to `double`. However, that statement, although correct, is misleading because it is not the source of the problem asked about in the question. – Eric Postpischil Jul 07 '13 at 11:49
@EricPostpischil: agreed, that phrasing is misleading. Thanks for the clarification – Amro Jul 07 '13 at 15:38

score 1 · Answer 6 · answered Sep 07 '13 at 19:16

As already explained, all floats can be exactly represented as a double and the reason for your issue is that System.out.println performs some rounding when displaying the value of a float or double but the rounding methodology is not the same in both cases.

To see the exact value of the float, you can use a BigDecimal:

float f = 125.32f;
System.out.println("value of f = " + new BigDecimal(f));
double d = (double) 125.32f;
System.out.println("value of d = " + new BigDecimal(d));

which outputs:

value of f = 125.31999969482421875
value of d = 125.31999969482421875

score 0 · Answer 7 · answered Mar 19 '14 at 05:05

0

it won`t work in java because in java by default it will take real values as double and if we declare a float value without float representation like 123.45f by default it will take it as double and it will cause an error as loss of precision

answered Mar 19 '14 at 05:05

sekhar beri

1

score 0 · Answer 8 · answered Jul 06 '18 at 04:29

The representation of the values changes due to contracts of the methods that convert numerical values to a String, correspondingly java.lang.Float#toString(float) and java.lang.Double#toString(double), while the actual value remains the same. There is a common part in Javadoc of both aforementioned methods that elaborates requirements to values' String representation:

There must be at least one digit to represent the fractional part, and beyond that as many, but only as many, more digits as are needed to uniquely distinguish the argument value from adjacent values

To illustrate the similarity of significant parts for values of both types, the following snippet can be run:

package com.my.sandbox.numbers;

public class FloatToDoubleConversion {

    public static void main(String[] args) {
        float f = 125.32f;
        floatToBits(f);
        double d = (double) f;
        doubleToBits(d);
    }

    private static void floatToBits(float floatValue) {
        System.out.println();
        System.out.println("Float.");
        System.out.println("String representation of float: " + floatValue);
        int bits = Float.floatToIntBits(floatValue);
        int sign = bits >>> 31;
        int exponent = (bits >>> 23 & ((1 << 8) - 1)) - ((1 << 7) - 1);
        int mantissa = bits & ((1 << 23) - 1);
        System.out.println("Bytes: " + Long.toBinaryString(Float.floatToIntBits(floatValue)));
        System.out.println("Sign: " + Long.toBinaryString(sign));
        System.out.println("Exponent: " + Long.toBinaryString(exponent));
        System.out.println("Mantissa: " + Long.toBinaryString(mantissa));
        System.out.println("Back from parts: " + Float.intBitsToFloat((sign << 31) | (exponent + ((1 << 7) - 1)) << 23 | mantissa));
        System.out.println(10D);
    }

    private static void doubleToBits(double doubleValue) {
        System.out.println();
        System.out.println("Double.");
        System.out.println("String representation of double: " + doubleValue);
        long bits = Double.doubleToLongBits(doubleValue);
        long sign = bits >>> 63;
        long exponent = (bits >>> 52 & ((1 << 11) - 1)) - ((1 << 10) - 1);
        long mantissa = bits & ((1L << 52) - 1);
        System.out.println("Bytes: " + Long.toBinaryString(Double.doubleToLongBits(doubleValue)));
        System.out.println("Sign: " + Long.toBinaryString(sign));
        System.out.println("Exponent: " + Long.toBinaryString(exponent));
        System.out.println("Mantissa: " + Long.toBinaryString(mantissa));
        System.out.println("Back from parts: " + Double.longBitsToDouble((sign << 63) | (exponent + ((1 << 10) - 1)) << 52 | mantissa));
    }
}

In my environment, the output is:

Float.
String representation of float: 125.32
Bytes: 1000010111110101010001111010111
Sign: 0
Exponent: 110
Mantissa: 11110101010001111010111
Back from parts: 125.32

Double.
String representation of double: 125.31999969482422
Bytes: 100000001011111010101000111101011100000000000000000000000000000
Sign: 0
Exponent: 110
Mantissa: 1111010101000111101011100000000000000000000000000000
Back from parts: 125.31999969482422

This way, you can see that values' sign, exponent are the same, while its mantissa was extended retained its significant part (11110101010001111010111) exactly the same.

The used extraction logic of floating point number parts: 1 and 2.

Curt · Answer 9 · 2013-07-07T05:22:06.180

-1

Both are what Microsoft refers to as "approximate number data types."

There's a reason. A float has a precision of 7 digits, and a double 15. But I have seen it happen many times that 8.0 - 1.0 - 6.999999999. This is because they are not guaranteed to represent a decimal number fraction exactly.

If you need absolute, invariable precision, go with a decimal, or integral type.

edited Jul 07 '13 at 05:22

answered Jul 06 '13 at 16:34

Curt

5,518
1
21
35

1

"approximate" is quite an awkward way to describe what an IEEE floating point number is, since it *exactly* represents a very precisely defined set of numbers. – Marko Topolnik Jul 06 '13 at 16:38
1

This doesn't address the question at all. – Zong Jul 06 '13 at 16:39
"Approximate-number data type" is precisely what Microsoft calls float and real: http://msdn.microsoft.com/en-us/library/ms173773.aspx – Curt Jul 06 '13 at 16:43
@Curt If one starts calling a floating-point type “real”, then the first thing one needs to mention is that it is approximate: as a datatype to store real numbers in, it certainly is. I don't think that documentation from Microsoft about floating-point types in SQL should be considered appropriate reference material for discussion of floating-point either in itself or in Java. – Pascal Cuoq Jul 07 '13 at 12:06
2

@MarkoTopolnik: From the standpoint of the code which performs low-level computations, IEEE floats are precisely-defined types. From the standpoint of *consumer* code, however, if a program reads two `float` values (say x=1.0 and y=10.0) and computes `float z=x/y;`, it is far more likely that the programmer regards `z` as holding an imperfect representation of value of the entered fraction, than a precise representation of the fraction 13421773/134217728. – supercat Sep 04 '13 at 07:54
@supercat Yes, I agree with all that. – Marko Topolnik Sep 04 '13 at 09:50

Why converting from float to double changes the value?

9 Answers9

More information regarding your updated question

Linked

Related