Most cpu-efficient way to populate a (symmetric) distance matrix in fortran

Question

I am working on implementing periodic boundary conditions in a finite element problem and I want to pair the nodes on boundary A with nodes on boundary B given a vector trans that lines boundary A up with boundary B. The nodes in boundary A are given in a list g1; in B they are g2. The node coordinates are looked up in mesh%nodes(:,nodenum).

I have decided to do this by creating a distance matrix for each node, which I realise is not the most efficient way, and to be honest I don't expect to save significant time by optimising this algorithm. The question is more academic.

I know that Fortran stores in column-major order, on the other hand, the array will be symmetric and when the array is completed I want to take column slices of it to find the nearest node. So the question is how should one populate this?

Here is my naive attempt.

subroutine autopair_nodes_in_groups(mesh, g1, g2, pairs, trans)

    type(meshdata)                              :: mesh
    
    
    integer(kind=sp)                            :: i,j
    integer(kind=sp),dimension(:)               :: g1,g2
    integer(kind=sp),dimension(:,:)             :: pairs
    
    real(kind=dp)                               :: trans(3) !xyz translate
    
    real(kind=dp)                               :: dist_mat(size(g1),size(g2)) 
    real(kind=dp)                               :: p1(3), p2(3)
    
    dist_mat = -1.0_wp
    
    ! make a distance matrix
    do j=1,size(g2)
        p2 = mesh%nodes(1:3,g2(j))-trans
        do i=1,j
            p1 = mesh%nodes(1:3,g1(i))
            dist_mat(i,j) = norm2(p1-p2)              !equivalent to norm2(n1pos+trans-n2pos)
            if (i.ne.j) dist_mat(j,i) = dist_mat(i,j) !fill symmetry
        end do
    end do
    
    ! Remainder of routine to find nearest nodes
    
    
end subroutine autopair_nodes_in_groups

The problem as far as I can tell is that this is efficient in terms of memory access until one symmetrises the array.

You might look at the sort of related https://stackoverflow.com/questions/68344472/transposition-of-a-matrix-by-multithread-in-fortran/68351049 which might give some ideas. I would also make the inner loop to `j-1` and set the diagonal element in the outer loop, saving a few ops and more importantly the `If` in the inner loop. But best would be if you can structure the later code so you only need a triangle of the distance matrix. Also do you really need the sqrt (implied by `norm2`)? Can you work with distance ** 2? — Ian Bush, Apr 28 '22 at 09:16
thanks for understanding my question! I was specifically worried about just saving the upper triangle and then needing to do a row-slice later in fortran. But you and Perini both seem to be not concerned about it. Maybe it shows me that I don't have a good handle on how to evaluate costs of computation because I thought that would be much worse than the if clause TBH. — Not a chance, Apr 28 '22 at 12:14

Federico Perini · Accepted Answer · 2022-04-28T12:28:39.740

To do a fast nearest-neighbor search, you should implement a tree structure that has search complexity O(log(N)) instead of looking at all point-to-point distances which are O(N^2).

Anyways, regarding symmetric matrix handling, you'll have:

! Storage size of a symmetric matrix
elemental integer function sym_size(N)
   integer, intent(in) :: N
   sym_size = (N*(N+1))/2
end function sym_size

! Compute pointer to an element in the symmetric matrix array
elemental integer function sym_ptr(N,i,j) 
   integer, intent(in) :: N,i,j

   integer :: irow,jcol

   ! Column-major storage
   irow = merge(i,j,i>=j)
   jcol = merge(j,i,i>=j)

   ! Skip previous columns
   sym_ptr = N*(jcol-1)-((jcol-1)*(jcol-2))/2

   ! Locate in current column
   sym_ptr = sym_ptr + (irow-jcol+1)

end function sym_ptr

then do your job:

   N = size(g2)
   allocate(sym_dist_mat(sym_size(N)))

   do j=1,size(g2)
        p2 = mesh%nodes(1:3,g2(j))-trans
        do i=j,size(g2)
            p1 = mesh%nodes(1:3,g1(i))
            sym_dist_mat(sym_ptr(N,i,j)) = norm2(p1-p2)  
        end do
   end do

The minloc function should then look something like this (untested):


! minloc-style
pure function symmetric_minloc(N,matrix,dim) result(loc_min)
    integer, intent(in) :: N
    real(8), intent(in) :: matrix(N*(N+1)/2)
    integer, intent(in) :: dim

    real(8) :: dim_min(N),min_column
    integer :: loc_min(N)

    select case (dim)

       ! Row/column does not matter: it's symmetric!
       case (1,2)

          dim_min = huge(dim_row)
          loc_min = -1

          ptr = 0
          do j=1,N

             ! Diagonal element m(j,j)
             ptr=ptr+1
             min_column = matrix(ptr)
             if (min_column<=dim_min(j)) then
                loc_min(j) = j
                dim_min(j) = min_column
             end if

             ! Lower-diagonal part at column j
             do i=j+1,N
               ptr=ptr+1

               ! Applies to both this column,
               if (matrix(ptr)<=dim_min(j)) then
                  loc_min(j) = i
                  dim_min(j) = matrix(ptr)
               end if

               ! And the i-th column
               if (matrix(ptr)<=dim_min(i)) then
                  loc_min(i) = j
                  dim_min(i) = matrix(ptr)
               end if

            end do

          end do

       case default
          ! Invalid dimension
          loc_min = -1
    end select

end function symmetric_minloc

thanks for your answer and I take your point about the nearest neighbour method being log N instead of N^2. The trouble with your solution is that when I go to find each nodes nearest neighbour I would slice columns and apply minloc, but that would inevitably access data in sym_dist_mat in a non-contiguous way right? Is that still a good idea ? Or how would you overcome that then ? — Not a chance, Apr 28 '22 at 12:07
Yes it's non-contiguous but still vectorizable (the pointer function to an element in the symmetric matrix is `elemental`), so I wouldn't worry about that — Federico Perini, Apr 28 '22 at 12:14

Most cpu-efficient way to populate a (symmetric) distance matrix in fortran

1 Answers1