0

This is more of a best practice on Fortran code writing other than solving an error.

I have this following code sample with some large array that needs to be passed around to some subroutine for some calculation

program name
    implicit none
    integer, parameter:: n = 10**8
    complex(kind=8) :: x(n)
    integer :: i, nVal 

    nVal = 30
    do i =1,1000
        call test(x,nVal)

        !-----other calculations-----!
        ! after every step nVal chnages, and after few step nVal converges
        ! e.g. `nVal` starts from 30 and converges at 14, after 10-15 steps, and stays there for rest of the loop
        ! once `nVal` converges the `workarray` requires much less memory than it requires at the starts
    enddo


    contains
    subroutine test(arr,m)
        integer , intent(inout) :: m 
        complex(kind=8), intent(inout) :: arr(n)
        complex(kind=8) :: workarray(n,m) ! <-- large workspace

        !----- do calculation-----------!

        !--- check convergence of `m`----! 
    end

end program name

The internal workarray depends on a value that decreases gradually and reaches a convergence, and stays there for rest of the code. If I check the memory usage with top it shows at 27% from starts to finish. But after few steps the memory requirement should decrease too.
So, I modified the code to use allocatable workarray like this,




program name
    implicit none
    integer, parameter:: n = 10**8
    complex(kind=8) :: x(n)
    integer :: i, nVal, oldVal
    complex(kind=8), allocatable :: workarray(:,:)


    nVal = 30 
    oldVal = nVal

    allocate(workarray(n,nVal))
    do i =1,1000


        ! all calculation of the subroutine `test` brought to this main code

        !--- check convergence of `nVal`----! 
        if(nVal /= oldVal) then
            deallocate(workarray)
            allocate(workarray(n,nVal))
            oldVal = nVal
        endif

    enddo
end program name

Now, If I use top the memory usage starts at about 28% and then decreases and reaches a converged value of 19%.

Now, my question is how should I code situations like this. The allocatable option do decreases memory requirement but it also hampers the code readability a little bit and introduces code duplication in several places. On the other hand, the prior option keeps larger memory for the whole time where much less memory would suffice. So, what is preferred way of coding in this situation?

Eular
  • 1,707
  • 4
  • 26
  • 50
  • Why don't you just allocate the work array in the subroutine? That's the way I would do it – Ian Bush Sep 13 '21 at 15:18
  • To be absolutely clear I would have `workarray` as an allocatable array within the scope of the test subroutine. This avoids potential issues with putting large arrays on the stack, assuming automatic arrays are implemented that way, and keep the number of lines `workarray` is in scope to a minimum. If the overhead due to the extra allocations/deallocations is signifcant you are doing something a bit strange. – Ian Bush Sep 13 '21 at 15:34
  • My actual loops run for like 15000 steps that it reaches convergence within 15 steps, so allocation within the subroutine means additional 15000-15 allocation/deallocations, which seems unnecessary. Though I didn't measure it's performance hit. – Eular Sep 13 '21 at 15:50
  • I struggle to get the point. You want the program to decrease the occupied memory during runtime when the local array gets smaller? The behaviour will depend on many details of the processor - that means the details of the compiler and also of the operating system and the memory allocator used. – Vladimir F Героям слава Sep 13 '21 at 16:57
  • But certainly do see https://stackoverflow.com/questions/2215259/will-malloc-implementations-return-free-ed-memory-back-to-the-system I think the answers are outdated and as you observe in your case, you do get something back. But it is certainly not something guaranteed nor easy to do. And certainly in no way described by the Fortran standard, it is much lower level. – Vladimir F Героям слава Sep 13 '21 at 16:59
  • 2
    Finally, be advised that you are asking about local arrays - automatic or allocatable, but local. Not about dummy arguments, if anything, then `arr` is a dummy array, not `workarray`. This may have caused your [last question](https://stackoverflow.com/questions/69143172/fortran-memory-mangement-for-dummy-arrays) to get a different answer than you might have hoped for. Normally I would have changed the title here myself, but I feel you should decide yourself what is the main point here. – Vladimir F Героям слава Sep 13 '21 at 17:02
  • So, if I understand, once a local array is used in a procedure, that just keeps occupying the memory, until the end of the lifecycle of the program? So, if I call a single subroutine, that uses huge local arrays, at the very start of a code, only once never again, that memory will still be occupied for the rest of the runtime? – Eular Sep 13 '21 at 17:39
  • My understanding of [stackoverflow.com/questions/2215259/…](https://stackoverflow.com/questions/2215259/will-malloc-implementations-return-free-ed-memory-back-to-the-system) and similar is that memory will only rarely be returned from a program to the system, but that freeing memory within a program fairly reliably returns that memory to the program itself for later use. – veryreverie Sep 13 '21 at 18:24
  • @Eular It is quite difficult to return once allocated memory to the OS. It can be done, but the part of memory you ceased to occupy may be somewhere in the middle of the address space. If allocattion is done by moving the brake using `brk()` or `sbrk()` https://stackoverflow.com/questions/6988487/what-does-the-brk-system-call-do than it is hard to go back. But there are various address spaces and pages and all that, so return of some large blocks is possible. See also https://stackoverflow.com/questions/48358229/how-can-i-get-a-guarantee-that-when-a-memory-is-freed-the-os-will-reclaim-that – Vladimir F Героям слава Sep 13 '21 at 18:55
  • I think it is easier to return the memory to the OS with `mmap` (and it is used for large allocations) and hard with `(s)brk` (used for smaller allocations). But this is all very compiler (what malloc is used?) and OS dependent. – Vladimir F Героям слава Sep 13 '21 at 18:56
  • So, when allocate/deallocate decreasees memory usage percentage in `top`, what am I seeing there? – Eular Sep 14 '21 at 01:08

1 Answers1

0

I can't help you decide which of the two methods is better; it will depend on how you (or the users of your code) value the potential tradeoff between memory use and cpu use. However, I can suggest a better version of your second method.

Rather than passing workarray in and out of test, you can keep it local to test and use the save attribute to make it persistent between procedure calls.

This would look something like

program name
  implicit none
  
  integer, parameter :: dp = selected_real_kind(15,300)
  
  integer, parameter:: n = 10**8
  complex(dp) :: x(n)
  integer :: i, nVal 
  
  nVal = 30
  do i =1,1000
    call test(x,nVal)
  enddo
contains
  subroutine test(arr,m)
    complex(dp), intent(inout) :: arr(:)
    integer, intent(inout) :: m 
    
    ! Initialise workarray to an empty array
    ! Avoids having to check if it is allocated each time
    complex(dp), allocatable, save :: workarray(:,:) = reshape([complex(dp)::], [0, 0])
    
    ! Reallocate workarray if necessary.
    if (size(workarray, 2)<m) then
      deallocate(workarray)
      allocate(workarray(size(arr), m))
    endif
  end subroutine
end program

If m is likely to increase slowly, you may also want to consider replacing allocate(workarray(size(arr), m)) with allocate(workarray(size(arr), 2*m)), such that you get c++ std::vector-style memory management.

The main downside of this approach (besides not reducing the memory use) is that you need to be more careful if you want to run parallel code which uses procedures with saved variables.

veryreverie
  • 2,871
  • 2
  • 13
  • 26