2

I have a hotspot which looks like this. Some kind of vector gather here would be nice... Any suggestion on how to get the compiler to like this?

        do ii = 1, N
           if (diff(ii) .le. M ) then
              i = i0 + ii - 1
              rbuf( irb ) = i 
              irb = irb + 1               
           end if
        end do

Using ifort 16.0.2 my opt report looks like

LOOP BEGIN at code.f(197,13)
   remark #25084: Preprocess Loopnests: Moving Out Store    [ code.f(203,13) ]
     remark #15344: loop was not vectorized: vector dependence prevents vectorization
     remark #15346: vector dependence: assumed FLOW dependence between irb line 201 and irb line 200
     remark #15346: vector dependence: assumed ANTI dependence between irb line 200 and irb line 201
     remark #15346: vector dependence: assumed ANTI dependence between irb line 200 and irb line 201
     remark #15346: vector dependence: assumed FLOW dependence between irb line 201 and irb line 200
     remark #25439: unrolled with remainder by 2
     remark #25015: Estimate of max trip count of loop=1600
  LOOP END

Here is small test program

program vect

integer :: ii, i0, irb
integer, parameter :: N=32
integer, parameter :: M=8
integer, dimension(N) :: diff
integer, dimension(2*N) :: rbuf

rbuf = 0

!only some values of diff will meet condition
!could be random
do ii=1, N
   diff(ii) = ii
end do

!from an outer loop
i0=1003

!this is code for filling up a buffer for an expensive vectorized
!subroutine with full vectors, irb < 2*N
irb=3

do ii = 1, N
   if (diff(ii) .le. M ) then
      i = i0 + ii - 1
      rbuf( irb ) = i 
      irb = irb + 1               
   end if
end do


!check 
do ii = 1, 2*N
   write(*,*) ii, rbuf(ii)
end do

end
user1984528
  • 353
  • 3
  • 15
  • 1
    Is `diff` a function or an array? Could you create a minimum compilable example to play around with? – Alexander Vogt May 03 '16 at 19:20
  • @AlexanderVogt now with an example – user1984528 May 03 '16 at 22:35
  • 2
    You're copying an array, filtering out some elements based on a condition? See [this question](http://stackoverflow.com/questions/36932240/avx2-what-is-the-most-efficient-way-to-pack-left-based-on-a-mask), with SSE and AVX2 answers (and an AVX512 answer). It's possible to vectorize, but C compilers won't do it for you. Probably not Fortran compilers either. You'll need something equivalent to C intrinsics, or just call a C or asm function. – Peter Cordes May 04 '16 at 03:05

1 Answers1

1

Depending on the target architecture I was able to get the compiler to vectorize with a directive.

!CDIR$ IVDEP
do ii = 1, N
 if (diff(ii) .le. M ) then
   i = i0 + ii - 1
   rbuf( irb ) = i 
   irb = irb + 1               
 end if
end do

and -xMIC-AVX512 or -mmic will give vector instructions for those architectures. e.g.

vpcompressd %zmm0, -4+vect_$RBUF.0.1(,%rax,4){%k1}      #29.15 c1

For AVX2, I think one has resort to intrinsics/ asm as @Peter Cordes suggests in his comment, but it is good to know that compiler can figure this out going forward.

user1984528
  • 353
  • 3
  • 15
  • 1
    Cool, I'm glad compilers are smart enough to use `vpcompressd` / `vcompressps`. The instruction seems to be designed for exactly this use-case. Now that compilers can recognize this pattern and use `vcompress`, maybe they'll learn to auto-vectorize with AVX2 and BMI2. (My technique of generating the shuffle mask on the fly gives a speedup over scalar without requiring a lookup-table at all, and not too much code-size. :) – Peter Cordes May 04 '16 at 18:30