0

I have a seemingly simple problem: I want to detect whether a floating point addition in Fortran will overflow by doing something like the following:

real*8 :: a, b, c

a = ! some value
b = ! some value

if (b > DOUBLE_MAX - a) then
  ! handle overflow
else
  c = a + b

The problem is that I don't know what DOUBLE_MAX should be. I'm aware of how floating point numbers are represented according to IEEE 754 but the largest value representable by a double precision floating point number seems to be too large for a variable of type real*8 (i.e. if I try to assign 1.7976931348623157e+308 to such a variable gfortran complains). C and C++ have predefined constants/generic functions for this purpose but I couldn't find a Fortran equivalent.

Note: I'm aware that real*8 is not really part of the standard but there seems to be no other way to reliably specify that a floating point number should use the double precision format.

Peter
  • 2,919
  • 1
  • 16
  • 35
  • I looked for a duplicate here but didn't find it, although I expected to be able to find one. There are lots of posts including 'huge' as a phrase, unfortunately. Feel free to point this to a duplicate instead and I'll delete my answer. – Ross Dec 04 '19 at 18:52
  • 3
    *"there seems to be no other way to reliably specify that a floating point number should use the double precision"* There are several ways, but `real*8` is NOT one of them. It specifies a real with 8 bytes of storage size, not double precision in any meaning I am aware of. For Fortran double precision use `double precision` or a kind constant set to `kind(1.0_dp)`. If you want the IEEE double precision floating type, use the IEEE module, do not use `real*8`, that guarantees nothing. – Vladimir F Героям слава Dec 04 '19 at 18:52
  • Strictly real*8 specifies and guarantees nothing at all, it may not even compile, not being Fortran. Look at https://stackoverflow.com/questions/838310/fortran-90-kind-parameter to see how you should be doing it, please throw this archaic extension in the bin. – Ian Bush Dec 04 '19 at 22:00
  • You might also want to look at standard Fortran's way of handling Floating point exceptions – Ian Bush Dec 04 '19 at 22:02

2 Answers2

3

Something like this?

real(REAL64) function func( a, b )
  use, intrinsic :: iso_fortran_env, only: REAL64, INT64
  use, intrinsic :: ieee_arithmetic, only: ieee_value, ieee_set_flag, IEEE_OVERFLOW, IEEE_QUIET_NAN
  implicit none
  real(REAL64), intent(in) :: a, b
  real(REAL64), parameter  :: MAX64 = huge(0.0_REAL64)

  if ( b > MAX64-a ) then
    ! Set IEEE_OVERFLOW flag and return NaN
    call ieee_set_flag(IEEE_OVERFLOW,.true.)
    func = ieee_value(func,IEEE_QUIET_NAN)
  else
    func = a + b
  end if

  return
end function func

All I could find for intrinsic ieee_exceptions module is:
https://github.com/gcc-mirror/gcc/blob/master/libgfortran/ieee/ieee_exceptions.F90

For setting NaN value see post.

jcerar
  • 467
  • 4
  • 13
  • 1
    ieee_value (as in the link) is a cleaner way of setting the NAN value. Also I think I would do the addition and then test for the overflow rather than as written, it just feels as though I am making less assumptions about the floating point implementation that way. – Ian Bush Dec 05 '19 at 10:28
  • @Ian Bush indeed it is much cleaner way. I changed to `ieee_value`... – jcerar Dec 05 '19 at 10:52
  • Neat, but isn't REAL64 only available from Fortran 2003? – Peter Dec 05 '19 at 11:38
  • Nvm, this at least seems to compile with a Fortran 95 compiler so I'll accept this as the correct way to do what I want. And @IanBush: You're suggesting checking for overflow with `ieee_get_flag`, right? That seems more reasonable. – Peter Dec 05 '19 at 11:47
  • 2
    Yes - I'd do the addition and use ieee_get_flag (and also make sure about the halting mode). I'd also probably use ieee_selected_real_kind as well. And all this is Fortran 2003, which is the standard you should be using nowadays – Ian Bush Dec 05 '19 at 12:07
2

There are likely better ways to detect overflow, but the precise answer to your question is to use the huge function. HUGE(a) returns the largest possible number representable by the type a.

Ross
  • 2,130
  • 14
  • 25