How to really implement vectorization in Fortran?

Question

All the below code are using -O3 -xHost (or -Ofast -march=native in gfortran) optimization.

example1

integer, parameter :: r8=selected_real_kind(15,9)
subroutine f(a,b,c)
real(kind=r8) :: a(1000), b(1000), c(1000)
c(:)=a(:)+b(:)
return
end

The above code should be vectorized,right? The real is real 8 type, it should took 64 bit in the ram. For AVX2 it is with 256 bit register or something, The above code is like doing c(1:4) = a(1:4) + b(1:4) in one clock cycle or something, and 4 is because 256/64=4.

But why do I frequently find that actually doing the loop is faster? see example 2,

example2

integer, parameter :: r8=selected_real_kind(15,9)
subroutine f(a,b,c)
real(kind=r8) :: a(1000), b(1000), c(1000)
integer :: i
do i=1,1000
    c(i)=a(i)+b(i)
endddo
return
end

I notice that the compiler in example1 may complain that the array are not aligned or something. Therefore actually example2 is faster than example1. But I mean in principle both example1 and 2 should be the same speed right? In example2, the compiler should be smart enough to vectorize the code correspondingly.

Finally, in the below code, example3, is there any vectorization?

example3

integer, parameter :: r8=selected_real_kind(15,9)
subroutine f(a,b,c)
real(kind=r8) :: a(1000), b(1000), c(1000)
c(:) = exp(a(:)) + log(b(:))
return
end

or slightly complicated things like, example4 example4

integer, parameter :: r8=selected_real_kind(15,9)
subroutine f(a,b,c)
real(kind=r8) :: a(1000), b(1000), c(1000)
c(:) = exp(a(:)) + log(b(:)) + exp(a(:)) * log(b(:))
return
end

I mean is vectorization is only for those very basic operatons like, op = + - * /

a(:) = b(:) op c(:)

However, for something more complicated, like

 a(:) = log(b(:) + c(:)*exp(a(:))*ln(b(:)))

Then vectorization may not work?

It seems in many case using a do loop is faster than writing things like a(:)=b(:)+c(:). It seem the compiler are doing very good job or highly optimized in just doing the do loops.

For whole-array operations, it is better to not use `(:)` as it can lead to extra compiler work and suboptimal code performance. Simply drop it everywhere. — Scientist, Aug 11 '21 at 05:08
Thanks! I do find that basically most of time if I use (:) the code is slower than just using do loops instead of (:). — CRquantum, Aug 11 '21 at 05:59
See https://stevelionel.com/drfortran/2008/03/31/doctor-it-hurts-when-i-do-this/ — Vladimir F Героям слава, Aug 11 '21 at 06:14
Your last point is way too broad. Please ask one question. If you want to ask if something is still true, select a specific claim, not a whole tutorial. What exact claim is or is not supposed to be still true? — Vladimir F Героям слава, Aug 11 '21 at 06:16
Have you looked at the compiler generated optimisation reports to see what it claims it is doing? — Ian Bush, Aug 11 '21 at 07:45
@IanBush If I use (:), the report usually says things like the array is not aligned stuff so no vectorization, somethine like that. But don't know to align array correctly. So pehaps not using (:) is a good advice. — CRquantum, Aug 11 '21 at 08:43
One problem is that Fortran `allocate` does not support alignment more than is strictly necessary for the given architecture. Another problem is that a row of a multi-D array may not be aligned in the same way the first rray element is anyway. — Vladimir F Героям слава, Aug 11 '21 at 10:34
Note that loops are unrolled often which makes them faster due to memory localization. — John Alexiou, Aug 11 '21 at 12:14
If you are going to post the same questions in multiple forums, it would seem appropriate to include a URL pointing to that discussion. Otherwise, you're wasting people's times. https://fortran-lang.discourse.group/t/what-really-is-vectorization-and-how-to-implement-it/1665/6 — steve, Aug 11 '21 at 15:47
I am voting to close as too broad. Please do really select one question. The broad format is fine for a Discourse discussion but not for Stack Overflow. If you need to explain what vectorization in this context is, please see https://stackoverflow.com/questions/1422149/what-is-vectorization?rq=1 (potential duplicate) I f you want to clarify some bits from some PDF, please ask about those actual bits. **Be specific!** — Vladimir F Героям слава, Aug 11 '21 at 16:00
@CRquantum: [Vladimir F](https://stackoverflow.com/users/721644/vladimir-f) already has "proven himself" on Stack Overflow, earning 51k reputation, and a gold badge in the Fortran tag. And [posting answers](https://stackoverflow.com/search?q=user%3A721644+%5Bsimd%5D+or+%5Bavx%5D+or+%5Bopenmp%5D) in performance and SIMD-auto-vectorization related tags like [openmp], and a few in [simd]. Demanding that someone "prove themselves" to you before you take their advice on how to fit your question into SO's format just looks like you getting frustrated and demanding that someone write you a tutorial. — Peter Cordes, Aug 11 '21 at 19:44
It's more like we don't want to write a chapter or whole book in the answer box, so first we need to get the question specific enough to answer it. That might involve you gaining enough background knowledge to not be asking a bunch of other stuff in addition to why `a(:) = ...` seems to be slower than a plain loop. If you don't want help in the form it's being offered by experts on SO, that's fine, this question can go unanswered, and maybe get closed. It might have been nice to have a good answer to writing SIMD-friendly loops in Fortran, if there isn't one already, but oh well. — Peter Cordes, Aug 11 '21 at 19:58
@PeterCordes No one ask you to write any chapter and I doubt anyone will buy the book. However, may I ask, can you answer the questions in the question? — CRquantum, Aug 18 '21 at 20:25
@PeterCordes I am confused, do you know Fortran? I read your comments and none of these are related with Fortran. — CRquantum, Aug 18 '21 at 20:27

How to really implement vectorization in Fortran?

0 Answers0