3

I'm trying to create a sparse square matrix in Matlab through a mex function (written in Fortran). I want something like A = sparse(I,J,K) . My triplets look like this, there are repetitions among the entries

femi = [1, 2, 3, 2, 2, 4, 5, 5, 4, 6, 6, 5, 5, 2]
femj = [2, 2, 1, 1, 1, 3, 3, 6, 3, 1, 1, 2, 2, 4]
femk = [2, 1, 5, 4, 2, 4, 5, 7, 2, 1, 6, 2, 1, 4]

I've written a rough piece of code, it works for small matrix dimensions, but it's much slower than the intrinsic Matlab's sparse. Since I have almost no background in coding, I don't know what I'm doing wrong (wrong way to allocate variables? too many do loops?). Any help is appreciated. Thank you. This is the mex computational subroutine. It returns the pr, ir, jc indices array to give to the sparse matrix

subroutine new_sparse(femi, femj, femk, pr, ir, jc, n, m)

      implicit none
      intrinsic:: SUM, COUNT, ANY
      integer :: i, j, k, n, indjc, m
      real*8  :: femi(n), femj(n), femk(n)
      real*8  :: pr(n)
      integer :: ir(n),jc(m+1)
      logical :: indices(n)

      indices = .false.
      k = 1
      indjc = 0
      jc(1) = 0
      do j=1,m
            do i =1,m
                indices = [femi==i .and. femj==j]
                if (ANY(indices .eqv. .true.)) then                                     
                    ir(k) = i-1
                    pr(k) = SUM(femk, indices)
                    k = k+1
                    indjc = indjc + 1
                end if
            end do
            if (indjc/=0) then
                jc(j+1) = jc(j) + indjc
                indjc = 0
            else
                jc(j+1) = jc(j) 
            end if
      end do


      return      
      end

Edit:
As suggested by users @jack and @veryreverie in the comments below, it's better to sort directly femi, femj and femk. I guess that ranking/sorting femi first (and sorting femj and femk according to femi) and then ranking/sorting femj (and sorting femi and femk according to femj) provides the desired result. The only thing left is to deal with duplicates.

Edit #2 :
I translated line by line the serialized version of the C code by Engblom and Lukarksi . This document explains very clearly their reasoning and I think it's useful for beginners like me. However, due to my inexperience, I was unable to translate the parallelized version of the code. Maybe that prompts another question.

    subroutine new_sparse(ir, jcS, pr, MatI, MatJ, MatK, n, m)

! use omp_lib

implicit none
integer,  parameter   :: dp = selected_real_kind(15,300)
integer,  intent(in)  :: n, m 
real(dp), intent(in)  :: MatK(n),     MatI(n),     MatJ(n)
! integer*8,  intent(out) :: nnew
integer ::  i, k, col, row, c, r !, nthreads
integer ::  hcol(m+1), jcS(m+1), jrS(m+1)
integer ::  ixijs, irank(n), rank(n)
real*8  ::  pr(*)
integer ::  ir(*)

hcol = 0
jcS  = 0
jrS  = 0
    
do i = 1,n
    jrS(MatI(i)+1) = jrS(MatI(i)+1)+1
end do
do r = 2,m+1
    jrS(r) = jrS(r) + jrS(r-1)
end do

do  i = 1,n 
     rank(jrS(MatI(i))+1) = i
     jrS(MatI(i)) = jrS(MatI(i)) + 1
end do

k = 1
do row = 1,m
    do i = k , jrS(row) 
        ixijs = rank(i)
        col = MatJ(ixijs)
        if (hcol(col) < row) then
            hcol(col) = row
            jcS(col+1) = jcS(col+1)+1
        end if
        irank(ixijs) = jcS(col+1)
        k = k+1
    end do
end do

do c = 2,m+1  
    jcS(c) = jcS(c) + jcS(c-1)
end do

do i = 1,n
    irank(i) = irank(i) + jcS(MatJ(i))
end do    

ir(irank) = MatI-1
do i = 1,n
    pr(irank(i)) = pr(irank(i)) +  MatK(i)
end do

return    
end
Kadiya
  • 63
  • 1
  • 6
  • Welcome, I suggest taking the [tour]. – Vladimir F Героям слава Jun 01 '21 at 20:02
  • Why are you trying to write your own sparse function? MATLAB has a mature and fast sparse functionality already. – hpaulj Jun 01 '21 at 20:09
  • I'm solving a problem through finite element method (200k elements). I have a Mex function written in Fortran that does all the nested element loop calculations and that creates the triplets that will be "fed" to Matlab to create a sparse matrix in the global calculations. It takes 100k "time" steps to complete the global loop. So, I was wondering if there is a faster alternative to save some time. – Kadiya Jun 02 '21 at 07:05
  • Am I correct in the assumption that your inputs `femi, femj, femk` are of form `(row_i,col_i,val_i)` with repetitions and your output should be of [CSC-form](https://en.wikipedia.org/wiki/Sparse_matrix#Compressed_sparse_column_(CSC_or_CCS))? – jack Jun 02 '21 at 07:38
  • 1
    What about just ordering your `femi,femj,femk`, then adding the repititions, then outputing that? Matlab can read that format: [`S = sparse(i,j,v)`](https://de.mathworks.com/help/matlab/ref/sparse.html). – jack Jun 02 '21 at 07:42
  • Yes, the triplets are `(row_i, col_i, val_i)`. Their length is about 30 millions. Matlab's `sparse` can handle the repetitions without any issues, so I don't need to rank/sort them if I'm using that. I was just thinking about saving time even in a simple operation like creating a sparse, since for my problem I have to do it 100k times at least, and globally it takes 6 hours to complete the simulation. – Kadiya Jun 02 '21 at 07:52
  • What is your desired output format? – jack Jun 02 '21 at 08:03
  • Matlab's CSC-form. – Kadiya Jun 02 '21 at 08:10
  • @Kadiya As there are repetitions in `femi, femj, femk` the arrays `pr, ir` could be shorter than `n`, or not? It might be better to define them `allocatable` and shorten them in the end. – jack Jun 02 '21 at 12:04
  • Kadiya, the algorithm results in the elements sorted in ascending order of `j` index, so you might as well sort your inputs as @jack suggests. If you sort them using a major key of `j` and a minor key of `i`, then writing the conversion algorithm is trivial, and the whole thing will run in `O(n*log(n))` time, much faster than either your algorithm or mine. – veryreverie Jun 02 '21 at 13:43
  • @veryreverie @jack if I understood correctly, should I rank/sort `j` and then sort `i` according to `j` ? By using [OrderPack](http://www.fortran-2000.com/rank/#3.1) for example? I don't know beforehand how many nonzeros there will be when I create in Mex `plhs(1) = mxCreateSparse (m, m, nnz, 0)` , I only know that `m=n/144` . – Kadiya Jun 02 '21 at 14:03
  • @Kadiya you need to sort all three of `femi`, `femj` and `femk` by the same sort key, such that `femj` is in ascending order, and the sections of `femi` corresponding to each value in `femj` are in ascending order. – veryreverie Jun 02 '21 at 14:05
  • 1
    @veryreverie yes, that was my first attempt at the beginning, full sorting of `femj` , `femi`, and `femk` according to `j` , then sorting `femi` and `femk` section-wise according to `i` and returning the fully sorted arrays as outputs, but it crashed for large `n` dimensions, probably because I'm not familiar with the concept of dynamic allocating – Kadiya Jun 02 '21 at 14:16
  • @Kadiya if you get stuck writing the routine to deal with duplicates from the sorted list, I suggest opening a new question, including your attempt so far. – veryreverie Jun 03 '21 at 12:53
  • For arbitrary inputs, the MATLAB sparse( ) function has to sort the data, combine duplicate entries, and handle 0's. Unless you can generate the data to skip/avoid some of these steps, I don't see the point of writing your own code for this. It is unlikely you will be able to beat the MATLAB function for speed. Can you write your algorithm to generate the data already sorted and without storing 0's etc.? – James Tursa Jun 09 '21 at 17:00
  • @JamesTursa The input is a triplet of arrays I generate from a Mex function through a `do` loop over `Nele` elements. The length of each array is `144*Nele` , while the final sparse matrix dimension is `Ndofs*Ndofs`, thus many repetitions are contained in the input triplet. Right now I can't find a way to avoid the repetitions already when assembling the triplet. I was sure from the start that there was no point in trying to create something faster than Matlab's `sparse` function, given my almost non-existent programming skills and given the complexity of a commercial software function. – Kadiya Jun 09 '21 at 17:24
  • I am taking it now as a spare time exercise to learn a little more about mex functions and coding in general. – Kadiya Jun 09 '21 at 17:24

1 Answers1

1

This should work:

module test
  implicit none
  
  ! This should probably be whatever floating point format Matlab uses.
  integer, parameter :: dp = selected_real_kind(15,300)
contains
subroutine new_sparse(femi, femj, femk, pr, ir, jc, n, m)
  integer, intent(in) :: n ! The size of femi, femj, femk.
  integer, intent(in) :: m ! The no. of rows (and cols) in the matrix.
  integer, intent(in) :: femi(n) ! The input i indices.
  integer, intent(in) :: femj(n) ! The input j indices.
  real(dp), intent(in) :: femk(n) ! The input values.
  real(dp), intent(out) :: pr(n) ! The output values.
  integer, intent(out) :: ir(n) ! The output i indices.
  integer, intent(out) :: jc(m+1) ! Column j has jc(j+1)-jc(j) non-zero entries
  
  ! loop indices.
  integer :: a,b
  
  ! Initialise jc.
  ! All elements of `jc` are `1` as the output initially contains no elements.
  jc = 1
  
  ! Loop over the input elements.
  do_a : do a=1,n
    associate(i=>femi(a), j=>femj(a), k=>femk(a))
      ! Loop over the stored entries in column j of the output,
      !    looking for element (i,j).
      do b=jc(j),jc(j+1)-1
        ! Element (i,j) is already in the output, update the output and cycle.
        if (ir(b)==i) then
          pr(b) = pr(b) + femk(a)
          cycle do_a
        endif
      enddo
      
      ! Element (i,j) is not already in the output.
      ! First make room for the new element in ir and pr,
      !    then add the element to ir and pr,
      !    then update jc.
      ir(jc(j+1)+1:jc(m+1)) = ir(jc(j+1):jc(m+1)-1)
      pr(jc(j+1)+1:jc(m+1)) = pr(jc(j+1):jc(m+1)-1)
      ir(jc(j+1)) = i
      pr(jc(j+1)) = k
      jc(j+1:) = jc(j+1:) + 1
    end associate
  enddo do_a
end subroutine
end module

program prog
  use test
  implicit none
  
  integer, parameter :: n = 14
  integer, parameter :: m = 6
  
  integer  :: femi(n), femj(n)
  real(dp) :: femk(n)
  
  real(dp) :: pr(n)
  integer  :: ir(n),jc(m+1)
  
  integer :: a,b
  
  femi = [1, 2, 3, 2, 2, 4, 5, 5, 4, 6, 6, 5, 5, 2]
  femj = [2, 2, 1, 1, 1, 3, 3, 6, 3, 1, 1, 2, 2, 4]
  femk = real([2, 1, 5, 4, 2, 4, 5, 7, 2, 1, 6, 2, 1, 4], dp)
  
  write(*,*) 'Input:'
  do a=1,n
    write(*,'(a,i0,a,i0,a,f2.0)') '(',femi(a),',',femj(a),') : ',femk(a)
  enddo
  
  write(*,*)
  
  call new_sparse(femi,femj,femk,pr,ir,jc,n,m)
  
  write(*,*) 'Output:'
  do a=1,m
    do b=jc(a),jc(a+1)-1
      write(*,'(a,i0,a,i0,a,f2.0)') '(',ir(b),',',a,') : ',pr(b)
    enddo
  enddo
end program

This writes:

Input:
(1,2) : 2.
(2,2) : 1.
(3,1) : 5.
(2,1) : 4.
(2,1) : 2.
(4,3) : 4.
(5,3) : 5.
(5,6) : 7.
(4,3) : 2.
(6,1) : 1.
(6,1) : 6.
(5,2) : 2.
(5,2) : 1.
(2,4) : 4.

Output:
(3,1) : 5.
(2,1) : 6.
(6,1) : 7.
(1,2) : 2.
(2,2) : 1.
(5,2) : 3.
(4,3) : 6.
(5,3) : 5.
(2,4) : 4.
(5,6) : 7.

The bottleneck in your algorithm comes from the instructions indices = [femi==i .and. femj==j], any(indices .eqv. .true.) and sum(femk, indices). These all take O(n) operations, and as these are within a double loop the overall cost of the subroutine is O(m^2*n).

My algorithm works in two stages. The first stage, the do b=jc(j),jc(j+1)-1 loop, compares each element in the input with each element in the matching column of the output, for a maximum cost of O(mn) operations. If the input element is found in the output, then the value is updated and nothing more needs to be done.

If the input element is not found in the output, then it needs to be added to the output. This is handled by the second stage, the code after the do b... loop. Since this needs to move the output elements in order to make space for the new element, this stage has a maximum of O(n'^2) operations, where n' is the number of unique elements in the input, which should satisfy n'<=n and n'<<m^2 for a sparse matrix.

My algorithm should run a lot faster for large m and n, but it certainly has a lot of scope for improvement. I suspect it is worth using an intermediate data structure for storing ir and pr, so that new elements can be inserted without having to re-arrange all the elements to do so.

veryreverie
  • 2,871
  • 2
  • 13
  • 26
  • @veryreverie Thank you for this subroutine, it's a huge improvement . My only quibble is that sparse indexing there is zero-based, so `jc=jc-1` . Thanks a lot – Kadiya Jun 02 '21 at 13:54
  • @Kadiya ah, yep, I don't know any matlab, so everything I've written uses fortran conventions and may need some modifications to suit matlab. Also, see my comment on your question above: my algorithm here is actually significantly slower than you can manage with sorting. – veryreverie Jun 02 '21 at 14:01