Gaussian elimination MPI Fortran

Question

I am learning parallelization with OpenMP and MPI. Right now, I am coding a simple Gaussian Elimination program without pivoting. The sequential code would be as follows:

program ge
implicit none
integer, PARAMETER::n = 4
real, dimension (n,n+1)::A
integer::i,j,k
real start, finish

print *
print *, "Basic gaussian elimination"
print *, "By Jose Ortega Valiente"
print *
print *, "Square matrix dimension = ", n

call cpu_time(start)
!matrix A shape 
DO i = 1, n
    DO j = 1, n
        A(i,j) = i + j
    END DO
            A(i,i)=A(i,i)*100
END DO

    ! calcualte RHS (B)
    do i = 1, n
       A(i, n+1) = 0.0
       do j = 1, n
           A(i, n+1) = A(i, n+1) + A(i,j)
       end do
    end do

!let us print the matrix A|RHS
print*
print*,"Matrix A"
DO, i = 1, n
        write(*,"100g15.5") ( A(i,j), j=1,n+1 )
END DO


!Basic Gaussian Elimination
DO k = 1, n
    DO j = k+1, n
        A(k,j) = A(k,j)/A(k,k)
    END DO
    A(k,n+1) = A(k,n+1)/A(k,k)
    A(k,k)=1.0

    DO i = k+1, n
        DO j = k+1, n-1
                        A(i,j) = A(i,j) - A(i,k)*A(k,j)
        END DO
        A(i,n+1) = A(i,n+1) - A(i,k)*A(k,n+1)
                    A(i,k) = 0.0
            END DO

END DO 


!Let us print the result matrix A|RHS
print*
print*, "Matrix A result"
    DO, i = 1, n
            write(*,"100g15.5") ( A(i,j), j=1,n+1 )
    END DO

call cpu_time(finish)
print *, ""
print *, ""
print *, "It took: ", finish-start

end program ge

I had no problem to do the OpenMP version of this program, but I have no idea how to start with the MPI version.

The OpenMP version is here, just in case someone had interest on it:

program ge
implicit none
include 'omp_lib.h'
integer, PARAMETER::n = 4
real, dimension (n,n+1)::A
integer::i,j,k
real start, finish  
integer :: my_rank, p

!$omp parallel private( my_rank )
    !$ p = omp_get_num_threads()

!$omp single
print *
print *, "Basic gaussian elimination"
print *, "By Jose Ortega Valiente"
print *
print *, "Square matrix dimension = ", n
print *, "Number of threads available: ", p
!$omp end single

start = omp_get_wtime()
!matrix A shape
!$omp do reduction (+:A) 
    DO i = 1, n
        DO j = 1, n
            A(i,j) = i + j
        END DO
                A(i,i)=A(i,i)*100
    END DO
!omp end do

!The RHS is very simple - No difference in parallel
    ! calculate RHS (B)
!$omp single
    do i = 1, n
        A(i, n+1) = 0.0
        do j = 1, n
                A(i, n+1) = A(i, n+1) + A(i,j)
        end do
    end do

!let us print the matrix A and the RHS
print*
print*,"Matrix A"
DO, i = 1, n
        write(*,"100g15.5") ( A(i,j), j=1,n+1 )
END DO
!$omp end single

!Basic Gaussian Elimination
!Outermost loop (k), iterations depend on each other 
DO k = 1+my_rank, n
    !Inner loop (j), order doesn't affect the result
    !Dividing the whole row by the element A(k,j)
    !$omp do 
        DO j = k+1, n
            A(k,j) = A(k,j)/A(k,k)
        END DO
    !$omp end do
    A(k,n+1) = A(k,n+1)/A(k,k)
    A(k,k)=1.0

    !Inner loop (i), order doesn't affect the result
    !Eliminating column k
    !$omp do 
        DO i = k+1, n
            DO j = k+1, n-1
                            A(i,j) = A(i,j) - A(i,k)*A(k,j)
            END DO
            A(i,n+1) = A(i,n+1) - A(i,k)*A(k,n+1)
                        A(i,k) = 0.0
                END DO
    !$omp end do
END DO 

!$omp end parallel
!Let us print the results (matrix A|RHS)
print*
print*, "Matrix A result"
    DO, i = 1, n
            write(*,"100g15.5") ( A(i,j), j=1,n+1 )
    END DO

finish = omp_get_wtime()
print *, ""
print *, ""
print *, "It took: ", finish-start

end program ge

My question is not that much about the code, but the approach. Could anyone help me to understand how to do that? My program is not working well, and I think it is because I don't really know how to approach it:

program ge
implicit NONE
include 'mpif.h'
integer, PARAMETER::n = 4
real, dimension (n,n)::A
real, dimension (n)::B,X
integer::i,j
integer:: norm,row,col
real multiplier
real start, finish
integer ierr, taskid, numtasks, stat

call MPI_INIT( ierr )
call MPI_COMM_RANK( MPI_COMM_WORLD, taskid, ierr )
call MPI_COMM_SIZE( MPI_COMM_WORLD, numtasks, ierr )

if (taskid == 0) then
    print *
    print *, "Basic gaussian elimination"
    print *, "By Jose Ortega Valiente"
    print *
    print *, "Square matrix dimension = ", n
    print *, "Number of Nodes = ", numtasks
end if

call cpu_time(start)
!matrix A shape 
DO i = 1, n
    DO j = 1, n
        A(i,j) = i + j
    END DO
    A(i,i)=A(i,i)*100
END DO

! calculate RHS (B) - no real difference with only 1 node
if (taskid == 0) then
        do i = 1, n
            B(i) = 0.0
            do j = 1, n
                    B(i) = B(i) + A(i,j)
            X(i) = 0 !just to initialize
            end do
        end do
end if

!let us print the matrix A and B
if (taskid == 0) then
    print*, ""
    print*,"Matrix A"
    DO i = 1, n
            write(*,"100g15.5") ( A(i,j), j=1,n )
    END DO

    print*, ""
    print*, "Vector B result"
    DO i = 1, n
            print *, B(i)
    END DO
end if


!Basic Gaussian Elimination
DO norm = 0, n-1
    ! Broadcast A(norm) row and B(norm)
    call MPI_Bcast(A(norm,0), n, MPI_REAL, 0, MPI_COMM_WORLD, ierr)
    call MPI_Bcast(B(norm), MPI_REAL, 0, MPI_COMM_WORLD, ierr)
    ! Send data from Node 0 to other Nodes
    if (taskid == 0) then
        DO i=1, numtasks
            DO row=norm+1+i, n, numtasks
                DO j=1, n
                    call MPI_SEND(A(row,j),n,MPI_REAL,i,0,MPI_COMM_WORLD,ierr)
                END DO
                call MPI_SEND(B(row),1,MPI_REAL,i,0,MPI_COMM_WORLD,ierr)
            END DO
        END DO

        DO row=norm+1, n, numtasks
            multiplier = A(row,norm)/A(norm,norm)
            DO col=norm, n
                A(row,col) = A(row,col) - A(norm,col)*multiplier
            END DO
            B(row) = B(row) - B(norm)*multiplier
        END DO

        !recieve the updated data drom other Nodes
        DO i=1, numtasks
            DO j=1, n
                call MPI_RECV(A(row,j),n,MPI_REAL,i,1,MPI_COMM_WORLD,stat, ierr)
            END DO
            call MPI_RECV(B(row),1,MPI_REAL,i,1,MPI_COMM_WORLD,stat, ierr)
        END DO

    else    !recieve data from process 0
        DO row=norm+1+taskid, n, numtasks
            DO j=1,n
                call MPI_RECV(A(row,j),n,MPI_REAL,0,0,MPI_COMM_WORLD,stat, ierr)
            END DO
                            call MPI_RECV(B(row),1,MPI_REAL,0,0,MPI_COMM_WORLD,stat, ierr)

            multiplier = A(row,norm)/A(norm,norm)
            DO col=norm, n
                A(row,col) = A(row,col)-A(norm,col)*multiplier
            END DO
            B(row) = B(row) - B(norm)*multiplier

            !Send back the results
            DO j=1, n
                call MPI_RECV(A(row,j),n,MPI_REAL,0,1,MPI_COMM_WORLD,stat, ierr)
            END DO
                            call MPI_RECV(B(row),1,MPI_REAL,0,1,MPI_COMM_WORLD,stat, ierr)
        END DO
    end if

END DO

!Back substitution
if (taskid == 0) then
    DO row=n-1, 0, --1
        X(row) = B(row)
        DO col=n-1, row, --1
            X(row) = X(row) - A(row,col)*X(col)
        END DO
        X(row) = X(row)/A(row,row)
    END DO
end if


call cpu_time(finish)

!Let us print the result matrix A and B
if (taskid == 0) then
    print*, ""
    print*, "Matrix A result"
        DO i = 1, n
                write(*,"100g15.5") ( A(i,j), j=1,n )
        END DO
    print*, ""
            print*, "Vector B result"
            DO i = 1, n
                print *, B(i)
            END DO
    print*, ""
            print*, "Vector X result"
            DO i = 1, n
                    print *, X(i)
            END DO


    print *, ""
    print *, ""
    print *, "It took: ", finish-start
end if

call MPI_FINALIZE(ierr)

end program ge

Don't use `cpu_time` for OpenMP programs, use `omp_get_wtime`. See http://stackoverflow.com/questions/10673732/openmp-time-and-clock-calculates-two-different-results for an explanation. — High Performance Mark, Dec 02 '15 at 17:59
You won't be happy I am repeating myself, but this is not a very good question for this site. Yes MPI and OpeMP are absolutely different, but we cannot write a whole tutorial here. There are whole websites or books for that. The most important thing is to realize that in MPI you have a set if independent processes, each typically having just part of the matrix and you must somehow coordinate their work on those pieces using the MPI subroutine calls. — Vladimir F Героям слава, Dec 02 '15 at 21:03
Actually, I'm not happy. I am already checking websites and books since I want it done. I am sorry if I wasn't clear when I said that my question was not that much about the code, but the approach. It doesn't help me to know the directives if I am not able to use them, do they? I've done easier tasks with MPI trying to get any idea, but for this Gaussian code I did, nothing works... — Jose Ortega Valiente, Dec 02 '15 at 22:34
If you tried something, you should make a question about why your code fails. That would be a question much better fit for this site. This site is about code, less about discussing approaches. Others may disagree and write some good answer, but I find it hard given we don't know where you struggle or which aspect of MPI you may have trouble understanding. — Vladimir F Героям слава, Dec 02 '15 at 22:42
Is this related? http://cseweb.ucsd.edu/classes/fa98/cse164b/Projects/PastProjects/LU/ (If I google for "parallel Gaussian elimination", there seem to be some more good pages and articles.) Also, for the strategy of parallelization, this forum may be worth trying http://scicomp.stackexchange.com/ — roygvib, Dec 02 '15 at 22:53
I edited the original post to include the MPI code I'm trying to compile. The error I get is: PGF90-W-0155-The number of subscripts is less than the rank of a (ge2.f90: 71) PGF90-W-0155-The number of subscripts is less than the rank of a (ge2.f90: 86) PGF90-W-0155-The number of subscripts is less than the rank of a (ge2.f90: 92) PGF90-W-0155-The number of subscripts is less than the rank of a (ge2.f90: 102) So I guess I'm sending wrong information, but don't know why — Jose Ortega Valiente, Dec 02 '15 at 23:19
This isn't an MPI problem. You declare `a` to be of rank-2, but then reference `a(row)` in places. — francescalus, Dec 02 '15 at 23:52
I thought I could send one array directly, like in C. I tried to handle this using a loop to send all the elements of that array. — Jose Ortega Valiente, Dec 03 '15 at 02:46
Please do not change your code with every comment. Make a question about your code and ask why it does not work. Let people answer you. After you compregend the problem make a new version and if you still have a problem, make a NEW question. In this case you can send the whole array, but you were doing it wrong. — Vladimir F Героям слава, Dec 03 '15 at 09:38
I second @VladimirF. Stack Overflow is not a site where random people solve your random problems. It is an Q&A site, a database of problems and solutions. If you post a question and then repeatedly modify it into something different, this is of no use to anyone else except you. No one should be forced to dig into the question's edit history just to find out how the solution has progressed in time. — Hristo Iliev, Dec 05 '15 at 11:10

Gaussian elimination MPI Fortran

0 Answers0