Fast swap two dimension in an array in Fortran

Question

Suppose I have a multi-dimension array in Fortran A(:,:,:,:), where the size of each dimension is the same, n.

I would like to swap the second and third dimension. I can do a reshape A2 = reshape(A, (/ n, n, n, n/), order = (/1,3,2,4/) ) Is there any faster way? Since I only need a partial swap.

Updated Example code

Program reshape_test

  Use, Intrinsic :: iso_fortran_env, Only :  wp => real64, li => int64

  real :: A(32,32,32,32), A2(32,32,32,32)
  integer :: i,j,k,l,n,m, m_iter
  Integer( li ) :: start, finish, rate
  
  
  m_iter = 3200
  n = 32
  do l = 1, n
    do k = 1, n
       do j = 1, n
         do i = 1, n
           A(i,j,k,l) = i + j * k**2 - l
         end do 
       end do   
    end do      
  end do        
 
  Call System_clock( start, rate )
  do m=1,m_iter
    A2 = reshape(A, (/n, n, n, n/), order = (/1,3,2,4/) ) 
  end do
  
 ! write (*,*) A(1,2,3,4), A(1,3,2,4), A2(1,2,3,4), A2(1,3,2,4)
  Call System_clock( finish, rate )  
  
  Write(*,*) 'Time for reshape-1', Real( finish - start, wp ) / rate  
  
  
  Call System_clock( start, rate )
  do m=1,m_iter

  do j = 1, n
    do i = 1, n  
      A2(:,i,j,:) =  A(:,j,i,:)
    end do      
  end do      
  end do
  
  !write (*,*) A(1,2,3,4), A(1,3,2,4), A2(1,2,3,4), A2(1,3,2,4)
  Call System_clock( finish, rate )  
  
  Write(*,*) 'Time for reshape-2', Real( finish - start, wp ) / rate  
    
    
  Call System_clock( start, rate )
  do m=1,m_iter 
  do l = 1, n
    do k = 1, n
       do j = 1, n
         do i = 1, n
           A2(i,k,j,l) = A(i,j,k,l)
         end do 
       end do   
    end do      
  end do   
  end do
  
 ! write (*,*) A(1,2,3,4), A(1,3,2,4), A2(1,2,3,4), A2(1,3,2,4)
  Call System_clock( finish, rate )      
  
  Write(*,*) 'Time for reshape-3', Real( finish - start, wp ) / rate  
    
        

end program reshape_test

In this example, by gfortran -o3, I got (I run three times)

 Time for reshape-1   13.500307800000000
 Time for reshape-2   10.146418400000000
 Time for reshape-3   20.489294800000000

 Time for reshape-1   11.421597100000000
 Time for reshape-2   9.3823936999999997
 Time for reshape-3   19.856221900000001

 Time for reshape-1   14.376207500000000
 Time for reshape-2   10.756465400000000
 Time for reshape-3   21.301044500000000

Adding -march=native leads to (I run three times)

 Time for reshape-1   12.517529200000000
 Time for reshape-2   10.240939200000000
 Time for reshape-3   26.637998799999998

 Time for reshape-1   12.436479700000000
 Time for reshape-2   9.9803838000000002
 Time for reshape-3   21.760061700000001

 Time for reshape-1   11.419214300000000
 Time for reshape-2   10.908747200000001
 Time for reshape-3   21.445951399999998

Overall approach 2 is the fastest. -march=native seems not very effective.

(code style of report timing follows answer in How to speed up reshape in higher rank tensor contraction by BLAS in Fortran?)

`write(*,*) 'Time for reshape', Real(finish - start, wp) / rate ` when the reshape is outside the timing determination? Also the initialization loop is a bit inefficient (see comment from Jack with the suggestion to check) — albert, Apr 01 '21 at 13:09
@AlphaF20 Note that gfortran throws an error because you have a typo. It is `-march=native` instead of `-martch=native` (arch like architecture). — jack, Apr 02 '21 at 13:09
I think you need to make the array bigger, or run each case a few thousand times to get more reliable numbers, and take the write-statement out of the timing. These results could be random. It is strange that `-march=native` gives consistently worse timings. — Jonatan Öström, Apr 07 '21 at 13:51

Jonatan Öström · Answer 1 · 2021-04-07T13:40:07.763

1

Why don't you try to compare reshape with explicit loops like

do i = 1, n
  do j = 1, n
     do k = 1, n
       do l = 1, n
         A2(i,k,j,l) = A(i,j,k,l)
       end do 
     end do   
  end do      
end do

and then you tell us the answer. It could also depend on optimization. I would suggest -O3 -march=native, and try changing to -O2, -O1 and -O0.

edited Apr 07 '21 at 13:40

answered Apr 01 '21 at 12:29

Jonatan Öström

2,428
1
16
27

3

Is it not more efficient to loop over the `l` index first (column-major ordering)? Or do modern compiler optimize that themselves? – jack Apr 01 '21 at 12:36
2

Optimisers might be able to sort out simple cases, but personally I would much prefer to do it myself - then I know it is right. And it's not any less readable – Ian Bush Apr 01 '21 at 15:10
What about leaving out the explicit `i` loop altogether, i.e. `A2(:,k,j,l) = A(:,j,k,l)`? – jack Apr 01 '21 at 15:16
Thanks. I will report in my question. – AlphaF20 Apr 01 '21 at 16:31
@jack In deed the loop order should be the opposite! I just copied from the question without checking. Using ":" instead of "i" is fine, but in this case I would not because then there is one more unknown. – Jonatan Öström Apr 07 '21 at 13:38

Fast swap two dimension in an array in Fortran

1 Answers1