2

How would one OpenMP parallelize following sparse matrix vector multiplication where matrix is in compressed sparse column format?

do i=1,lastcol
   do k=ia(i),ia(i+1)-1
      ind=ja(k)
      y(ind)=y(ind)+x(i)*a(k)
   end do
end do

Here, ia, ja, and a are column pointer, row index, and nonzero value of matrix respectively. Thanks.

tiki
  • 419
  • 1
  • 6
  • 16

1 Answers1

2

Does the following work for you (updated with ATOMIC clause to prevent problem identified by Massimiliano)

!$ OMP PARALLEL DO PRIVATE(k, ind, temp)
do i=1,lastcol
   do k=ia(i),ia(i+1)-1
      ind=ja(k)
      temp = x(i)*a(k)
      !$ OMP ATOMIC
      y(ind)=y(ind)+temp
      !$ OMP END ATOMIC
   end do
end do
!$ OMP END PARALLEL DO

This should divide the "work" of the outer loops over a number of different processors, whilst making sure that there are separate copies of the inner loop variables k and ind

It's been a while since I have used OMP - if this doesn't work for you please use the comments to let me know. Meanwhile there is a very nice reference/tutorial here

Also - you will find a similar question was asked earlier - although the language was C, the basic loop structure was very similar. The conversation there suggests that when the matrix gets quite large (exceeding the size of the cache), the speedup from parallelization is minimal.

Community
  • 1
  • 1
Floris
  • 45,857
  • 6
  • 70
  • 122
  • I've used this exact approach before and it worked fine. You can also condense it down to one OMP directive: !$omp parallel do private(k,ind) You don't need to specify ``i`` as private when you use parallel do. – Daniel Shapero Jul 10 '13 at 00:35
  • @korrok thanks. I wasn't sure about including `i` as `private` - if you look at the edit history you will see I added it after the initial post, just to be sure. Agree condensing should work. What is the closing statement in that case? `END PARALLEL DO`? – Floris Jul 10 '13 at 01:11
  • 1
    The indirection in `ind=ja(k)` opens for race-conditions if you don't protect `y(ind)=y(ind)+x(i)*a(k)` with an `atomic` clause. – Massimiliano Jul 10 '13 at 04:16
  • @Massimiliano - if that is the case, I don't think you can get much from using OMP. But doesn't the fact that `i`, `k` and `ind` are private mean that won't happen? `ja` is not changed inside the loop - and the point of a sparse array should be that there is only one entry for each `ind`. Am I wrong? – Floris Jul 10 '13 at 04:34
  • @Floris Yes. CSC is a sparse matrix format (not a sparse array), and the fact that a given index `ind` appears more than once in `ja` means that there are different columns with non-zero entries at the row `ind`. In case you are interested you can find more information [here](https://en.wikipedia.org/wiki/Sparse_matrix) – Massimiliano Jul 10 '13 at 04:51
  • @Massimiliano - I have updated the code. I believe this addresses the race condition you identified; thanks for your inputs. – Floris Jul 10 '13 at 12:41
  • @korrok - I have updated my code based on your suggestion; thanks! – Floris Jul 10 '13 at 12:42
  • @Massimiliano -- great point about the potential race condition! I didn't notice that because I usually use CSR instead of CSC; the indirection is on the right-hand side of the assignments and not the left. – Daniel Shapero Jul 10 '13 at 19:19
  • @tiki - of course, thanks! I added it after making the innermost statement `ATOMIC` and forgot to update. Good catch! – Floris Jul 13 '13 at 00:50