0

I am dealing with a 4-fold loop. It is very slow because of the intrinsic function exp() in the inner loop. Here is a small example:

program main
  implicit none
  integer :: n, m, k, w
  real*8 :: a(100), b(100), c(1000), d(4)
  real*8 :: self(4,1000)

  a = 1.0d0
  b = 1.0d0
  c = 1.0d0
  d = 1.0d0

  self=0.0d0
  do n = 1, 100
    do m = 1, 100
      do k = 1, 1000
        do w = 1, 4
          self(w,k) = self(w,k) + exp( ( (c(k)-a(n))**2 + (c(k)-b(m))**2 ) / (2.0d0*d(w)**2) )
        enddo
      enddo
    enddo
  enddo

  ! my optimization:
  self=0.0d0
  do n = 1, 100
    do m = 1, 100
      !do k = 1, 1000
        do w = 1, 4
          self(w,:) = self(w,:) + exp( ( (c(:)-a(n))**2 + (c(:)-b(m))**2 ) / (2.0d0*d(w)**2) )
        enddo
      !enddo
    enddo
  enddo

end program

It looks like the fortran intrinsic function exp() is not efficient. Yet, I do not want to rewrite the exp() to be some equivalent expressions.

Davide Fiocco
  • 5,350
  • 5
  • 35
  • 72
Memories
  • 246
  • 1
  • 11
  • I'm under the assumption that a=b=c=d=1 is just an example? – kvantour May 02 '20 at 09:34
  • 2
    The problem is not that exp is inefficient, executing 100x100x1000x4 exp functions is just heavy – kvantour May 02 '20 at 09:35
  • Yes, this is just a demo. The actual loop size is 1million * 500 * 500 * 4 * 3, the outmost 1 million loop is parallelized – Memories May 02 '20 at 09:39
  • 5
    your exponent contains 2 parts which only contain 3 indices, you can precompute them and then only perform a final multiplication. This will already do some speed up. (precompute all `exp((c(k)-a(n)**2/(2*w(k)))` and all `exp((c(k)-b(m)**2/(2*w(k)))`) that leads to 2x100x1000x4 exp computations instead 2x100x100x1000x4 – kvantour May 02 '20 at 09:50
  • 1
    And for the bigger calculation if you can't afford the memory break it into blocks where you precompute 1 block, do all the work with that block, and move onto the next – Ian Bush May 02 '20 at 09:57
  • @Ian Bush, OK, though the memory is not a problem yet – Memories May 02 '20 at 10:00
  • Also please don't use the non-standard, non-portable real*8 which is not and never has been part of Fortran. Please see https://stackoverflow.com/questions/838310/fortran-90-kind-parameter – Ian Bush May 02 '20 at 10:33
  • 1
    Remarkably similar to https://stackoverflow.com/questions/61174309/how-to-optimize-this-fortran-subroutine-with-many-loops/61175101. The principal lesson in the answers to that question apply here too - don't bury computations too deeply inside loop nests and repeat them needlessly. – High Performance Mark May 02 '20 at 10:56
  • 2
    @kvantour, No, you can't do this. There is no such array called `w` of size 1000 and all four indices are used inside the `exp` function, there is `k`, `m`, `n`, and `w`. – AboAmmar May 03 '20 at 02:16
  • @AboAmmar That is a typo and copy paste from the typo. You could also have said that I missed a bracket.: `exp( (c(k)-b(m))**2/(2*d(w)) )` they are of this form and I hope I did not make a mistake now. – kvantour May 03 '20 at 13:58

1 Answers1

0

I posted my answer here, based on some nice comments above:

program main
  implicit none
  integer :: n, m, w
  real*8 :: a(100), b(100), c(1000), d(4)
  real*8 :: self(1000,4), exp_tmp(1000,4,100)

  a = 1.0d0
  b = 1.0d0
  c = 1.0d0
  d = 1.0d0

  do n = 1, 100
    do w = 1, 4
      exp_tmp(:,w,n) = exp( 0.5d0 * ( (c(:)-a(n))/d(w) )**2 )
    enddo
  enddo

  self=0.0d0
  do n = 1, 100
    do m = 1, 100
      do w = 1, 4
        self(:,w) = self(:,w) + exp_tmp(:,w,n) * exp_tmp(:,w,m)
      enddo
    enddo
  enddo

end program

I do not know whether the loops could be vectorized to be compact.

Memories
  • 246
  • 1
  • 11