2

I want to add elements to a 1d matrix mat, subject to a condition as in the test program below. In Fortran 2003 you can add an element

mat=[mat,i]

as mentioned in the related question Fortran array automatically growing when adding a value. Unfortunately, this is very slow for large matrices. So I tried to overcome this, by writing the matrix elements in an unformatted file and reading them afterwards. This turned out to be way faster than using mat=[mat,i]. For example for n=2000000_ilong the run time is 5.1078133666666661 minutes, whereas if you store the matrix elements in the file the run time drops to 3.5234166666666665E-003 minutes.
The problem is that for large matrix sizes the file storage.dat can be hundreds of GB... Any ideas?

program test


implicit none

integer, parameter :: ndig=8
integer, parameter :: ilong=selected_int_kind(ndig)
integer (ilong), allocatable :: mat(:)
integer (ilong), parameter :: n=2000000_ilong
integer (ilong) :: i, cn 
logical, parameter :: store=.false.
real(8) :: z, START_CLOCK, STOP_CLOCK


open(1, file='storage.dat',form='unformatted')

call cpu_time(START_CLOCK)

if(store) then 

 cn=0
 do i=1,n
   call random_number(z)
   if (z<0.5d0) then 
     write(1) i
     cn=cn+1
   end if 
 end do


 rewind(1); allocate(mat(cn)); mat=0

 do i=1,cn
  read(1) mat(i)
 end do


else 

 allocate(mat(1)); mat=0
 do i=1,n
   call random_number(z)
   if (z<0.5d0) then 
    mat=[mat,i]
   end if 
 end do



end if




call cpu_time(STOP_CLOCK)
print *, 'run took:', (STOP_CLOCK - START_CLOCK)/60.0d0, 'minutes.'


end program test
geom
  • 195
  • 8
  • If your mega-array is 100s of GB, and if your computer does not have sufficient memory, I wonder if you need to have the entire mega-array in memory at one time? If not, just fill up an array as the code proceeds, when it is full, write to file, empty it, and fill it up again ... – High Performance Mark Feb 09 '23 at 13:20
  • If you end up with a file of "hundreds of GB", there's absolutely no chance that your `mat` can fit into memory, whatever the method you are using (with a file, without a file, with reallocation, with simulated pushback...): you simply have to much data to store. – PierU Feb 09 '23 at 14:04

2 Answers2

3

If the data file has hundreds of gigabytes, than there can may be no solution available at all, because you need so much RAM memory anyway for your array. Maybe you made the mistake of storing the data as text and then the memory size will be somewhat lower, but still tens of GB.

What is often done, when you need to add elements one-by-one and you do not know the final size, is growing the array geometrically in steps. That means pre-allocate an array to size N. When the array is full, you allocate a new array of size 2*N. When the array is full again, you allocate it to 4*N. And so on. Either you are finished or you exhausted all your memory.

Of course, it is often best to know the size of the array beforehand, but in some algorithms you simply do not have the information.

  • It is the case where you don't know the size of the array beforehand. Thanks for the suggestion Vladimir – geom Feb 09 '23 at 13:44
  • I agree, do not copy the array (using `mat=[mat,i]`) every time an object is added. Only do it as when you run out of storage and need to expand the array by at least doubling the size. – John Alexiou Feb 09 '23 at 14:13
1

Maybe you need a dynamic container such as C++'s std::vector, with a push_back() function.

The following is a simplified version. You probably ought to check the allocation to make sure that you don't run out of addressable memory.

Note the need for random_seed.

module container
   use iso_fortran_env
   implicit none

   type array
      integer(int64), allocatable :: A(:)
      integer(int64) num
   contains
      procedure push_back
      procedure print
   end type array

   interface array                               ! additional constructors
      procedure array_constructor      
   end interface array

contains

   !----------------------------------------------

   function array_constructor() result( this )   ! performs initial allocation
      type(array) this
      allocate( this%A(1) )
      this%num = 0
   end function array_constructor

   !----------------------------------------------

   subroutine push_back( this, i )
      class(array), intent(inout) :: this
      integer(int64) i
      integer(int64), allocatable :: temp(:)

      if ( size(this%A) == this%num ) then       ! Need to resize
         allocate( temp( 2 * this%num ) )        ! <==== for example
         temp(1:this%num ) = this%A
         call move_alloc( temp, this%A )
!        print *, "Resized to ", size( this%A )  ! debugging only!!!
      end if

      this%num = this%num + 1
      this%A(this%num) = i

   end subroutine push_back

   !----------------------------------------------

   subroutine print( this )
      class(array), intent(in) :: this
      write( *, "( *( i0, 1x ) )" ) ( this%A(1:this%num) )
   end subroutine print

end module container

!=======================================================================

program test
   use iso_fortran_env
   use container
   implicit none
   type(array) mat
   integer(int64) :: n = 2000000_int64
   integer(int64) i
   real(real64) z, START_CLOCK, STOP_CLOCK

   mat = array()                       ! initial trivial allocation

   call random_seed                    ! you probably need this
   call cpu_time(START_CLOCK)
   do i = 1, n
      call random_number( z )
      if ( z < 0.5_real64 ) call mat%push_back( i )
   end do

   call cpu_time(STOP_CLOCK)
   print *, 'Run took ', ( STOP_CLOCK - START_CLOCK ) / 60.0_real64, ' minutes.'

!  call mat%print                      ! debugging only!!!

end program test
lastchance
  • 1,436
  • 1
  • 3
  • 12
  • This a classic case where Fortran needs generics. Maybe one day ... – John Alexiou Feb 09 '23 at 19:25
  • @lastchance I run it using `n = 2150000000_int64` and `print *, "Resized to ", size( this%A )` in order to check. In the last resize, I got `Resized to -2147483648`. Any idea about the negative sign? – geom Feb 13 '23 at 09:32
  • 1
    @geom, you nearly crippled my PC trying to replicate that! It's a monumental amount of memory. I suspect that you simply exceeded the range of the integer type that can be used for array indices. If it's a signed rather than unsigned integer it will wrap around to negative values again once it "goes off the top". Do you really need to store all these values? – lastchance Feb 13 '23 at 20:11
  • @lastchance yes, I do need to store all these values (unfortunately). Mystery solved (https://fortran-lang.discourse.group/t/size-of-long-array/3676) – geom Feb 13 '23 at 20:54