0

The background of my question is related to Optimizing array additions and multiplications with transposes

I am thinking about optimizng 0.1*A + 0.1*transpose(A,(1,0)) (possibly with more general transpose) by Fortran pointer, where A is an array. (transpose in python sense, seems related to reshape in Fortran)

I am not sure if tranposing/multiplying value via pointer will be faster than using array. I thought using pointer may restricted operation within given memory locations. If I use b = 0.1*a + 0.1*reshape(a, (/n1, n2, n3, n4/), order = (/2,1,3,4/) ) , reshape may be associated to different memory location.

Here is my code

  Program transpose_test
  
    use, Intrinsic :: iso_fortran_env, Only :  wp => real64, li => int64
    integer, parameter :: dp = selected_real_kind(15, 307)
  
   ! Implicit None
  
    real(dp), Dimension( :, :, :, :  ), Allocatable :: a, b
  
    integer :: n1, n2, n3, n4, i, m, n, m_iter
    integer :: l1, l2, l3, l4  
    integer(li) :: start, finish, rate
    real(dp) :: sum_time
    real(dp), target, allocatable  :: at(:,:,:,:)
    real(dp), pointer :: ap(:,:,:,:), bp(:,:,:,:)
    
    Write( *, * ) 'n1, n2, n3, n4?'
    Read( *, * ) n1, n2, n3, n4

    Allocate( a ( 1:n1, 1:n2, 1:n3, 1:n4 ) )
    
    i = 0
    do l1 = 1, n1
      do l2 = 1, n2
        do l3 = 1, n3
          do l4 = 1, n4
            a(l1, l2, l3, l4) = i 
            i = i + 1
          end do
        end do
      end do
    end do                            
  
    at = a
    ap => at
    bp => ap
    
    !print *, at
    print *, 'ap', ap
    print *, 'bp', bp
    sum_time = 0.0  
    do n = 1, m_iter  
      Call System_clock( start, rate )
      do l2 = 1, n2
        do l1 = 1, n1
          bp(l1,l2,:,:) => 0.1*ap(l1,l2,:,:) + 0.1*ap(l2,l1,:,:)
        end do
      end do        
      Call System_clock( finish, rate )
      sum_time = sum_time + Real( finish - start, dp ) / rate  
    end do 
    
    write (*,*) 'reshape pointer time', sum_time 
    print *, 'bp', bp

  End 

gfotran 9.3.0 gives

   56 |           bp(l1,l2,:,:) => 0.1*ap(l1,l2,:,:) + 0.1*ap(l2,l1,:,:)
      |          1
Error: Expected list of ‘lower-bound :’ or list of ‘lower-bound : upper-bound’ specifications at (1)

What would be the solution for the above error message? Generally, will the above approach out perform tranposing array, e.g.,

b = 0.1*a + 0.1*reshape(a, (/n1, n2, n3, n4/), order = (/2,1,3,4/) ) 

and comparing with numpy realization related to the question in the first paragraph. ?

Geositta
  • 81
  • 7
  • Any `python` reference? Why did you tag it with python? – sudden_appearance Apr 28 '22 at 08:59
  • not particular, just `transpose` is a python usage. I will remove `python` tag. – Geositta Apr 28 '22 at 09:05
  • 3
    First rule of modern Fortran: don't use pointers (unless you absolutely have to). You also have multiple questions here. For the performance part https://stackoverflow.com/questions/68344472/transposition-of-a-matrix-by-multithread-in-fortran/68351049 might be of help, but really is it absolutely necessary that you are transposing the last two indices, can the code be restructured so it is the first two indices? As Fortran is column major that should be appreciably more efficient. The syntax question requires a longer explanation, and suggests you don't really understand Fortran pointers. – Ian Bush Apr 28 '22 at 09:25
  • Thanks. I will read the link mentioned. – Geositta Apr 28 '22 at 09:28
  • I noticed that I forgot to use `-O3` in compling my Fortran code for `b = 0.1*a + 0.1*reshape(a, (/n1, n2, n3, n4/), order = (/2,1,3,4/) )`, and lead to impression that I need to improve the performance, use pointer, etc. By `-O3`, seems the performance is OK. I may close this question after a while. – Geositta May 02 '22 at 09:07

0 Answers0