1

Suppose I have some variable to double precision, x, that I read from a binary file. When printed this variable x = 384.6433096437924

I want to 'bump' it to quad precision.

integer, parameter :: qp = selected_real_kind(32,307)
integer, parameter :: dp = selected_real_kind(15,307)
real(kind=dp) :: x_double
real(kind=qp) :: x_quad

x_double = 384.6433096437924
x_quad = real(x_double, kind=dp)

print *, x_double, x_quad, 384.64330964379240_qp

This outputs,

384.6433096437924, 384.64330964379240681882947683334351, 384.643309643792400000000000000000009

Which of the quad numbers is the 'best' - i.e. most accurate, consistent etc. value to use? The one with 'junk' or the one without?

I understand (I think!) the issues relating to the exact representation of a float on a computer, but cannot see which is the way to go.

I compile as gfortran -fdefault-real-8

user1887919
  • 829
  • 2
  • 9
  • 24
  • 1
    Please check that your code is exact and that the results are copied exactly and tell us your compiler options. Please see the discussion below the answer. Your results are strange. See also https://stackoverflow.com/questions/16370206/numerical-precision-in-fortran-95 – Vladimir F Героям слава May 22 '18 at 15:37

1 Answers1

1

The one with what you call “junk” has the least loss of information from the double value.

When 384.6433096437924 is converted to double-precision floating-point, the result is a value in a binary-based format, and that value is different from the original value. If you know the original value had 16 significant decimal digits, then printing the double with 16 decimal digits can produce the original value (in most cases), and that makes sense.

However, if the number has been through various computations, and there is no reason to believe it represents a number that has some exact number of decimal digits, then truncating or altering its value for display does not provide any useful information. It may make it easier for a human to read, but the entire computed value is the best information we have about the result, so printing the entire value conveys the most information.

Alternatively, if the original number can be converted directly to quad precision without going through double precision, then the error from the conversion will generally be less, and this method should be preferred.

Ultimately, which form is best for display depends on the purposes of the display. Do you want to preserve the most information for future use? Or do you want to make the result easily readable by a human?

Finally, the “junk” you show is strange. The result of converting 384.6433096437924 to IEEE-754 basic 64-bit binary floating-point should be 384.64330964379240640482748858630657196044921875, but you show 384.64330964379240681882947683334351. I do not see how that value arose.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
  • That value may have arosed because the literal `384.6433096437924` is **single** precision. https://stackoverflow.com/questions/6146005/is-there-a-better-double-precision-assignment-in-fortran-90 https://stackoverflow.com/questions/16370206/numerical-precision-in-fortran-95 But maybe OP's compiler uses double as the default. – Vladimir F Героям слава May 22 '18 at 15:15
  • @VladimirF: Hmm, IEEE-754 32-bit would have produced 384.643310546875. But there may be something going on with the `selected_real_kind(15,307)` that I do not understand. I know IEEE-754 reasonably well but have not followed FORTRAN for many years. – Eric Postpischil May 22 '18 at 15:26
  • My results after executing OP's code in normally set `gfortran` are indeed `384.64331054687500 384.643310546875000000000000000000000 384.643309643792399999999999999999998` – Vladimir F Героям слава May 22 '18 at 15:33
  • `gfortran -fdefault-real-8` returns `384.64330964379241 384.643309643792406404827488586306572 384.643309643792399999999999999999998 ` – Vladimir F Героям слава May 22 '18 at 15:35
  • I compile as gfortran -fdefault-real-8. – user1887919 May 22 '18 at 17:14
  • 1
    @user1887919 That is a very important information, please edit your question and add it as I asked you before. Still, as you can see, I got somewhat different results. Please double-check everything and post the *exact* compiler options, the exact code and the compiler version. Again, as I asked you to do already. – Vladimir F Героям слава May 22 '18 at 21:57