The background of my question is related to Optimizing array additions and multiplications with transposes
I am thinking about optimizng 0.1*A + 0.1*transpose(A,(1,0))
(possibly with more general transpose) by Fortran pointer, where A
is an array. (transpose
in python sense, seems related to reshape
in Fortran
)
I am not sure if tranposing/multiplying value via pointer will be faster than using array. I thought using pointer may restricted operation within given memory locations. If I use
b = 0.1*a + 0.1*reshape(a, (/n1, n2, n3, n4/), order = (/2,1,3,4/) )
, reshape may be associated to different memory location.
Here is my code
Program transpose_test
use, Intrinsic :: iso_fortran_env, Only : wp => real64, li => int64
integer, parameter :: dp = selected_real_kind(15, 307)
! Implicit None
real(dp), Dimension( :, :, :, : ), Allocatable :: a, b
integer :: n1, n2, n3, n4, i, m, n, m_iter
integer :: l1, l2, l3, l4
integer(li) :: start, finish, rate
real(dp) :: sum_time
real(dp), target, allocatable :: at(:,:,:,:)
real(dp), pointer :: ap(:,:,:,:), bp(:,:,:,:)
Write( *, * ) 'n1, n2, n3, n4?'
Read( *, * ) n1, n2, n3, n4
Allocate( a ( 1:n1, 1:n2, 1:n3, 1:n4 ) )
i = 0
do l1 = 1, n1
do l2 = 1, n2
do l3 = 1, n3
do l4 = 1, n4
a(l1, l2, l3, l4) = i
i = i + 1
end do
end do
end do
end do
at = a
ap => at
bp => ap
!print *, at
print *, 'ap', ap
print *, 'bp', bp
sum_time = 0.0
do n = 1, m_iter
Call System_clock( start, rate )
do l2 = 1, n2
do l1 = 1, n1
bp(l1,l2,:,:) => 0.1*ap(l1,l2,:,:) + 0.1*ap(l2,l1,:,:)
end do
end do
Call System_clock( finish, rate )
sum_time = sum_time + Real( finish - start, dp ) / rate
end do
write (*,*) 'reshape pointer time', sum_time
print *, 'bp', bp
End
gfotran 9.3.0 gives
56 | bp(l1,l2,:,:) => 0.1*ap(l1,l2,:,:) + 0.1*ap(l2,l1,:,:)
| 1
Error: Expected list of ‘lower-bound :’ or list of ‘lower-bound : upper-bound’ specifications at (1)
What would be the solution for the above error message? Generally, will the above approach out perform tranposing array, e.g.,
b = 0.1*a + 0.1*reshape(a, (/n1, n2, n3, n4/), order = (/2,1,3,4/) )
and comparing with numpy realization related to the question in the first paragraph. ?