2

I got my parallel code (conductivityMAINp.f90 and conductivityCALp.f90) work and update them below. Can I ask some more questions?

  1. I find that the results from my serial and parallel codes give the almost same value but the decimal part of the value is different. I paste one of the test result below. Do you think that this difference between decimal part is normal or there may still be something wrong with code? Do you think that the results from serial and parallel code should be exactly the same or not?

serial version (-50979.1014624820,-8.548064305026142E-013) parallel version (-50979.0937138954,-6.321723719222822E-013)

  1. I also compared the files generated by serial and parallel ones. I find that some files have different size; like these files below.

serial version par.dat 26600 con.dat 3730147

parallel version par.dat 266 con.dat 37623

I understand that different processes enter these files and write down data into them separately so the data in these files were erased and overwritten by different processes. This is why the data in these files from serial and parallel ones are different from each other. Do you think that there is a way to keep data from all processes in the same file?

  1. Would you please recommend some textbooks for the MPI skills to me? I want to have a better understanding of the parallelization.

The conductivityMAINp.f90 source code

    PROGRAM MAIN        
    USE MPI        
    USE CAL        
    IMPLICIT NONE        
    !Variables for setting up the parameters in INPUT.dat file
    CHARACTER (LEN=50)            :: na(2)                !Array to store the names of Hamiltonian files from wannier90
    INTEGER                       :: km(2)                !k point mesh
    INTEGER                       :: vd                   !Velocity direction of the Hamiltonian matrix
    DOUBLE PRECISION              :: fermi                !Fermi energy value
    DOUBLE PRECISION              :: bv                   !Broadening value
    !        
    !Variables for parameters in '.wout' file
    INTEGER                       :: sta                  !Status of files
    DOUBLE PRECISION              :: rea_c(3,3)           !Lattice constant of unit cell in real space
    DOUBLE PRECISION              :: rec_c(3,3)           !Vectors of unit cell in the reciprocal space
    !        
    !Variables for parameters in Hamiltonian ('_hr.dat') file from wannier90
    INTEGER                       :: nu_wa                !Number of wannier function
    INTEGER                       :: nu_nr                !Number of Wigner-Seitz grid point
    INTEGER, ALLOCATABLE          :: nd(:)                !Degeneracy of each Wigner-Seitz grid point
    DOUBLE PRECISION, ALLOCATABLE :: hr(:,:)              !Array to store the Hamitlonian matrix information in '_hr.dat' file
    !        
    !Internal variables
    INTEGER                       :: i, j, k, l, n        !Integer for loop
    CHARACTER (LEN=100)           :: str                  !String for transitting data
    DOUBLE PRECISION              :: tr(3)                !Array for transitting data
    DOUBLE PRECISION, ALLOCATABLE :: kp(:,:)              !Array to store the Cartesian coordinate of k-point mesh
    DOUBLE PRECISION, ALLOCATABLE :: ka(:,:,:)            !Array to store the Cartesian coordiantes of all k points
    DOUBLE COMPLEX, ALLOCATABLE   :: tb(:,:)              !Array to store the extracted tight binding Hamiltonian matrix
    DOUBLE COMPLEX, ALLOCATABLE   :: ec(:,:)              !Array to store the Eigen vector matrix
    DOUBLE PRECISION, ALLOCATABLE :: ev(:,:)              !Array to store the Eigen value on single k point
    DOUBLE COMPLEX, ALLOCATABLE   :: vh(:,:)              !Array to store the velocity of Hamiltonian matrix
    DOUBLE PRECISION              :: dk(2)                !Array to store the Delta kx and ky
    DOUBLE COMPLEX                :: sc                   !Sum of conductivity on all km(1) k points
    DOUBLE COMPLEX, ALLOCATABLE   :: ct_all(:)            !Array of ct
    DOUBLE COMPLEX                :: ct                   !Sum of conductivity on all k points
    DOUBLE COMPLEX                :: ct_total             !Sum of conductivity
    !        
    !Parameters for timer
    INTEGER                       :: cr, t00, t0, t       !Timer variables
    DOUBLE PRECISION              :: ra                   !Timer rate        
    !Parameters for MPI
    INTEGER                       :: world_size           !MPI
    INTEGER                       :: world_rank, ierr     !MPI
    INTEGER                       :: irank, j0            !MPI
    !        
    !Initializing MPI
    CALL MPI_Init(ierr)
    CALL MPI_Comm_size(MPI_COMM_WORLD, world_size, ierr)
    CALL MPI_Comm_rank(MPI_COMM_WORLD, world_rank, ierr)
    !        
    !Initializing timer
    IF (world_rank .EQ. 0) THEN
       CALL system_clock(count_rate=cr)
       ra = REAL(cr)
    END IF
    !        
    !Starting timer for reading and broadcasting all input parameters
    IF (world_rank .EQ. 0) THEN
       CALL system_clock(t00)
       CALL system_clock(t0)
    END IF
    !        
    !Reading the parameters in the INPUT.dat file
    IF (world_rank .EQ. 0) THEN
       !Opening INPUT.dat file
       OPEN (UNIT=3, FILE='INPUT.dat', STATUS='OLD')
       !        
       READ (UNIT=3, FMT=*)
       READ (UNIT=3, FMT='(a)') na(1)
       READ (UNIT=3, FMT=*)
       READ (UNIT=3, FMT='(a)') na(2)
       DO i = 1, 8, 1
          READ (UNIT=3, FMT=*)
       END DO
       READ (UNIT=3, FMT=*) km
       READ (UNIT=3, FMT=*)
       READ (UNIT=3, FMT=*) vd
       READ (UNIT=3, FMT=*)
       READ (UNIT=3, FMT=*) fermi
       READ (UNIT=3, FMT=*)
       READ (UNIT=3, FMT=*)
       READ (UNIT=3, FMT=*)
       READ (UNIT=3, FMT=*) bv
       !        
       !Closing INPUT.dat file
       CLOSE(UNIT=3)
       !        
    !Opening files with magnetization along z axis
    OPEN (UNIT=4, FILE=TRIM(ADJUSTL(na(2))), STATUS='OLD', IOSTAT=sta)
    OPEN (UNIT=6, FILE=TRIM(ADJUSTL(na(1))), STATUS='OLD')
    !

    END IF
    !        
    !Broadcasting parameters from rank 0 to all other ranks
    CALL MPI_Bcast(na, 50*2, MPI_char, 0, MPI_COMM_WORLD, ierr)
    CALL MPI_Bcast(km, 2, MPI_int, 0, MPI_COMM_WORLD, ierr)
    CALL MPI_Bcast(vd, 1, MPI_int, 0, MPI_COMM_WORLD, ierr)
    CALL MPI_Bcast(fermi, 1, MPI_double, 0, MPI_COMM_WORLD, ierr)
    CALL MPI_Bcast(bv, 1, MPI_double, 0, MPI_COMM_WORLD, ierr)
    !        
    !Allocating array to store Cartesian coordinates of all k points
    ALLOCATE (ka(km(2),km(1),3))
    !        
    !Insitialising the array to store Carteisan coordiantes of all k points
    ka = 0.0d0
    !        
    !Reading the '.wout' file, generating coordiantes of all k points and computing delta kx and ky
    IF (world_rank .EQ. 0) THEN

       !Reading Lattice constant in real space
       DO WHILE (sta .EQ. 0)
                READ (UNIT=4, FMT='(a)', IOSTAT=sta) str
                IF (TRIM(ADJUSTL(str)) .EQ. 'Lattice Vectors (Ang)') THEN
                   DO i = 1, 3, 1
                      READ (UNIT=4, FMT='(a)', IOSTAT=sta) str
                      str = ADJUSTL(str)
                      READ (UNIT=str(4:), FMT=*) rea_c(i,:)
                   END DO
                   EXIT
                END IF
       END DO
       !        
       !Reading Vectors of unit cell in the reciprocal space
       DO WHILE (sta .EQ. 0)
                READ (UNIT=4, FMT='(a)', IOSTAT=sta) str
                IF (TRIM(ADJUSTL(str)) .EQ. 'Reciprocal-Space Vectors (Ang^-1)') THEN
                   DO i = 1, 3, 1
                      READ (UNIT=4, FMT='(a)', IOSTAT=sta) str
                      str = ADJUSTL(str)
                      READ (UNIT=str(4:), FMT=*) rec_c(i,:)
                   END DO
                   EXIT
                END IF
       END DO
       !        
       !Closing the output file with magnetization along z axis
       CLOSE (UNIT=4)
       !        
       !Generating the Cartesian coordinates for Monkhorst k-point mesh
       OPEN (UNIT=5, FILE='k_cartesian.dat', STATUS='UNKNOWN')        
       WRITE (UNIT=5, FMT='(I10)') km(1) * km(2)
       DO i = 1, km(2), 1
          DO j = 1, km(1), 1
             tr(1) = 0.0d0 + 1.0d0 / DBLE(km(1)) * DBLE(j - 1)
             tr(2) = 0.0d0 + 1.0d0 / DBLE(km(2)) * DBLE(i - 1)
             tr(3) = 0.0d0
             ka(i,j,1) = tr(1) * rec_c(1,1) + tr(2) * rec_c(2,1) +&
                         tr(3) * rec_c(3,1)
             ka(i,j,2) = tr(1) * rec_c(1,2) + tr(2) * rec_c(2,2) +&
                         tr(3) * rec_c(3,2)
             ka(i,j,3) = tr(1) * rec_c(1,3) + tr(2) * rec_c(2,3) +&
                         tr(3) * rec_c(3,3)
             WRITE (UNIT=5, FMT='(F15.8,3X,F15.8,3X,F15.8)') ka(i,j,1:3)
          END DO
       END DO

       CLOSE (UNIT=5)
       !        
       !Computing Delta kx and ky
       dk(1) = DSQRT(rec_c(1,1) ** 2 + rec_c(1,2) ** 2 + rec_c(1,3) ** 2) / DBLE(km(1))
       dk(2) = DSQRT(rec_c(2,1) ** 2 + rec_c(2,2) ** 2 + rec_c(2,3) ** 2) / DBLE(km(2))
       !
    END IF
    !        
    !Broadcasting lattice constants in both real and reciprocal spaces, the Cartesian coordiantes of all k points and
    !delta kx and ky from rank 0 to all ranks
    CALL MPI_Bcast(rea_c, 3*3, MPI_double, 0, MPI_COMM_WORLD, ierr)
    CALL MPI_Bcast(rec_c, 3*3, MPI_double, 0, MPI_COMM_WORLD, ierr)
    CALL MPI_Bcast(ka, km(2)*km(1)*3, MPI_double, 0, MPI_COMM_WORLD, ierr)
    CALL MPI_Bcast(dk, 2, MPI_double, 0, MPI_COMM_WORLD, ierr)
    !        
    !Stopping timer for reading and broadcasting all input parameters
    IF (world_rank .EQ. 0) THEN
       CALL system_clock(t)
       WRITE (*,'(A,F10.3)') "Time for INIT   (seconds):", (t - t0) / ra
    END IF
    !        
    !Starting timer for computing conductivity
    IF (world_rank .EQ. 0) THEN
       CALL system_clock(t0)
    END IF
    !        
    !Reading number of wannier function
    IF (world_rank .EQ. 0) THEN
       READ (UNIT=6, FMT=*)
       READ (UNIT=6, FMT=*) nu_wa
       !Reading number of Wigner-Seitz grind point in Hamiltonian file
       READ (UNIT=6, FMT=*) nu_nr
       !        
    END IF
    !        
    !Broadcasting number of wannier function and the degenerancy of each Wigner-Seitz grid point from rank 0 to all other ranks
    CALL MPI_Bcast(nu_wa, 1, MPI_int, 0, MPI_COMM_WORLD, ierr)
    CALL MPI_Bcast(nu_nr, 1, MPI_int, 0, MPI_COMM_WORLD, ierr)
    !        
    !Allocating the array to store the degeneracy of each Wigner-Seitz grid point
    ALLOCATE (nd(nu_nr))
    !        
    !Allocating array to store k point, Hamiltonian matrix, eigen vector matrix and eigen value
    !Allocating the array to store the Cartesian coordinates of k-point mesh
    ALLOCATE (kp(km(1),3))
    !        
    !Allocating the array to store the extracted tight binding Hamiltonian matrix
    ALLOCATE (tb(nu_wa*km(1),nu_wa))
    !        
    !Allocating the array to store the tight binding Eigen vector matrix
    ALLOCATE (ec(nu_wa*km(1),nu_wa))
    !        
    !Allocating the array to store the tight binding Eigen value
    ALLOCATE (ev(km(1),nu_wa))
    !        
    !Allocating array to store the velocity of Hamiltonian matrix
    ALLOCATE (vh(nu_wa*km(1),nu_wa*2))
    !        
    !Allocating the array to store Hamiltonian matrix information in '_hr.dat' file from wannier90
    ALLOCATE (hr(nu_wa**2*nu_nr,7))
    !        
    !Reading relevant information in Hamiltonian matrix
    !Reading the degeneracy of each Wigner-Seitz grid point
    IF (world_rank .EQ. 0) THEN
       IF (MOD(nu_nr, 15) .EQ. 0) THEN
          DO i = 1, nu_nr / 15, 1
             READ (UNIT=6, FMT=*) nd(1+(i-1)*15:i*15)
          END DO
       ELSE
          DO i = 1, nu_nr / 15, 1
             READ (UNIT=6, FMT=*) nd(1+(i-1)*15:15+i*15)
          END DO
          READ (UNIT=6, FMT=*) nd(1+(nu_nr/15)*15:nu_nr)
       END IF
       !           
       !Reading the Hamiltonian matrix information in '_hr.dat' file
       DO i = 1, nu_wa**2*nu_nr, 1
          READ (UNIT=6, FMT=*) hr(i,:)
       END DO        
       !Converting the unit number into coordinate for R in exponent term of phase factor in
       !tight binding Hamiltonian matrix for magnetic moment along z axis case
       DO i = 1, nu_wa**2*nu_nr, 1
          tr(1) = hr(i,1) * rea_c(1,1) + hr(i,2) * rea_c(2,1) + hr(i,3) * rea_c(3,1)
          tr(2) = hr(i,1) * rea_c(1,2) + hr(i,2) * rea_c(2,2) + hr(i,3) * rea_c(3,2)
          tr(3) = hr(i,1) * rea_c(1,3) + hr(i,2) * rea_c(2,3) + hr(i,3) * rea_c(3,3)
          hr(i,1:3) = tr
       END DO
       !
    END IF        
    !Broadcasting Hamiltonian from rank 0 to all other ranks
    CALL MPI_Bcast(nd, nu_nr, MPI_int, 0, MPI_COMM_WORLD, ierr)
    CALL MPI_Bcast(hr, nu_wa**2*nu_nr*7, MPI_double, 0, MPI_COMM_WORLD, ierr)
    !        
    !Opening file that stores the total conductivity value
    OPEN (UNIT=7, FILE='Conductivity.dat', STATUS='UNKNOWN')
    !        
    !!Building up the Hamitonian        
    !Initialising array used to store the total conductivity
    ct = CMPLX(0.0d0, 0.0d0)
    !
    !opening test files
    open (unit=21,file='normalisedprefactor.dat',status='unknown')
    open (unit=22,file='gd.dat',status='unknown')
    open (unit=23,file='con.dat',status='unknown')
    open (unit=24,file='par.dat',status='unknown')
    open (unit=25,file='grga.dat',status='unknown')
    open (unit=26,file='nfdk.dat',status='unknown')
    !Reading the Cartesian coordinates of k-point mesh
    DO j = 1, km(2), 1
       IF (mod(j-1, world_size) .NE. world_rank) CYCLE
       DO k = 1, km(1), 1
          kp(k,:) = ka(j,k,:)
       END DO
       !Building up Hamiltonian matrix on k points and diagonalising the matrix to obtain Eigen vectors and values
       CALL HAMCON(vd,kp,nu_wa,nu_nr,km(1),nd,hr,tb,ec,ev,vh,fermi,bv,dk,sc)
       !
       ct = ct + sc
    END DO
    !        
    CALL MPI_Barrier(MPI_COMM_WORLD, ierr)
    IF (world_rank .EQ. 0) THEN
        ALLOCATE (ct_all(world_size))
    END IF
    ct_all = CMPLX(0.0d0, 0.0d0)
    CALL MPI_Gather(ct, 1, MPI_double_complex, ct_all, 1, MPI_double_complex, 0, MPI_COMM_WORLD, ierr)        
    !Writing total conductivity value into the file
    IF (world_rank .EQ. 0) THEN
        ct_total = CMPLX(0.0d0, 0.0d0)
        DO i = 1, world_size, 1
           ct_total = ct_total + ct_all(i)
        END DO
        WRITE (UNIT=7, FMT='(A33,$)') 'Conductivity without coeffieicnt:'
        WRITE (UNIT=7, FMT=*) ct_total
        WRITE (UNIT=7, FMT='(A30,$)') 'Conductivity with coefficient:'
        WRITE (UNIT=7, FMT=*) !ct_total /
    END IF
    !        
    IF (world_rank .EQ. 0) THEN
       DEALLOCATE (ct_all)
    END IF        
    !Stopping timer for computing conductivity
    IF (world_rank .EQ. 0) THEN
       CALL system_clock(t)
       WRITE (*,'(A,f10.3)') "Time for  HAM&CON   (seconds):", (t-t0)/ra
       WRITE (*,'(A,f10.3)') "Time for  ALL       (seconds):", (t-t00)/ra
    END IF
    !        
    !Finalising MPI
    CALL MPI_Finalize(ierr)
    !        
    !Deallocating array that stores the degeneracy of each Wigner-Seitz grid point
    DEALLOCATE (nd)
    !        
    !Deallocating array that stores the Hamitlonian matrix information in '_hr.dat' file
    DEALLOCATE (hr)
    !        
    !Deallocating the array to store the Cartesian coordinates of k-point mesh
    DEALLOCATE (kp)
    !        
    !Deallocating the array to store the extracted tight binding Hamiltonian matrix
    DEALLOCATE (tb)
    !        
    !Deallocating array that stores the tight binding Eigen vector matrix
    DEALLOCATE (ec)
    !        
    !Deallocating array that stores the tight binding Eigen value
    DEALLOCATE (ev)
    !        
    !Deallocating array to store the velocity of Hamiltonian matrix
    DEALLOCATE (vh)
    !        
    !Closing files with magnetization along z axis
    CLOSE (UNIT=6)
    !        
    !Closing file that store the total conductivity
    CLOSE (UNIT=7)
    !
    close(unit=21)
    close(unit=22)
    close(unit=23)
    close(unit=24)
    close(unit=25)
    close(unit=26)        
    STOP
    END PROGRAM MAIN

The conductivityCALp.f90 source code

    MODULE CAL
    !USE MKL!LAPACK
    IMPLICIT NONE
    CONTAINS
    !Building up tight binding Hamiltonian matrix and computing eigen vector matrix and eigen value
    SUBROUTINE HAMCON(vd,kp,nu_wa,nu_ws,nu_kp,nd,hr,tb,ec,ev,vh,fermi,bv,dk,sc)
    !External variables
    INTEGER                       :: vd                   !Velocity direction of the Hamiltonian matrix
    DOUBLE PRECISION              :: kp(:,:)              !Array to store the Cartesian coordinate of k-point mesh
    INTEGER                       :: nu_wa                !Number of wannier function
    INTEGER                       :: nu_ws                !Number of Wigner-Seitz grid point for different magnetic moment direction cases
    INTEGER, ALLOCATABLE          :: nd(:)                !Degeneracy of each Wigner-Seitz grid point
    DOUBLE PRECISION, ALLOCATABLE :: hr(:,:)              !Array to store the Hamitlonian matrix information in '_hr.dat' file
    DOUBLE COMPLEX, ALLOCATABLE   :: tb(:,:)              !Array to store the extracted tight binding Hamiltonian matrix
    DOUBLE COMPLEX, ALLOCATABLE   :: ec(:,:)              !Array to store the Eigen vector matrix
    DOUBLE PRECISION              :: ev(:,:)              !Array to store the Eigen value
    DOUBLE COMPLEX, ALLOCATABLE   :: vh(:,:)              !Array to store the velocity of Hamiltonian matrix
    DOUBLE PRECISION              :: fermi                !Fermi energy value
    DOUBLE PRECISION              :: bv                   !Broadening value
    DOUBLE PRECISION              :: dk(2)                !Array to store the Delta kx and ky
    DOUBLE COMPLEX                :: sc                   !Sum of conductivity on all km(1) k points
    !
    !Internal variables
    INTEGER                       :: nu_kp                !Number of k point passed by the main code
    INTEGER                       :: i, j, k, l, m        !Integer for loop
    DOUBLE COMPLEX                :: dc(3)                !Array to store complex number i
    DOUBLE COMPLEX, ALLOCATABLE   :: tr1(:,:)             !Array for transitting Hamiltonian matrix
    DOUBLE COMPLEX, ALLOCATABLE   :: tr2(:,:)             !Array for transitting Hamiltonian matrix
    DOUBLE COMPLEX, ALLOCATABLE   :: tr3(:,:)             !Array for transitting Hamiltonian matrix
    DOUBLE COMPLEX, ALLOCATABLE   :: tr4(:,:)             !Array for transitting Hamiltonian matrix
    !
    !Variables for ZHEEV subroutine
    DOUBLE COMPLEX, ALLOCATABLE   :: a(:,:)               !Array for transitting the Eigen vector matrix
    DOUBLE PRECISION, ALLOCATABLE :: w(:)                 !Array for transitting the Eigen value
    INTEGER                       :: n, lda, lwork, info  !Parameters in ZHEEV subroutine
    DOUBLE PRECISION, ALLOCATABLE :: rwork(:)             !Parameters in ZHEEV subroutine
    DOUBLE COMPLEX, ALLOCATABLE   :: work(:)              !Parameters in ZHEEV subroutine
    !
    !Variables for computing conductivity
    DOUBLE COMPLEX                :: gr(2)                !Array to store the retarded Green functions
    DOUBLE COMPLEX                :: ga(2)                !Array to store the advanced Green functions
    DOUBLE COMPLEX                :: gd(2)                !Array to store the Gr - Ga
    DOUBLE COMPLEX, ALLOCATABLE   :: mt1(:,:)             !Array for storing conjugate eigen vectors
    DOUBLE COMPLEX, ALLOCATABLE   :: mt2(:,:)             !Array for storing eigen vectors
    DOUBLE COMPLEX, ALLOCATABLE   :: mt3(:,:)             !Array for storing conjugate eigen vectors
    DOUBLE COMPLEX, ALLOCATABLE   :: mt4(:,:)             !Array for storing eigen vectors
    DOUBLE COMPLEX, ALLOCATABLE   :: mt5(:,:)             !Array for storing velocity of Hamiltonian
    DOUBLE PRECISION, ALLOCATABLE :: nm(:)                !Array for storage of normalising prefactor
    DOUBLE COMPLEX                :: oc(2)                !Conductivity value on single k point
    !
    write(unit=24,fmt=*)vd,nu_wa,nu_ws,fermi,bv,dk(1),dk(2)
    tb = CMPLX(0.0d0, 0.0d0)
    dc(1) = CMPLX(0.0d0, 1.0d0)
    !Allocating array to transit Hamiltonian matrix
    ALLOCATE (tr1(nu_wa,nu_wa))
    ALLOCATE (tr2(nu_wa,nu_wa))
    ALLOCATE (tr3(nu_wa,nu_wa))
    ALLOCATE (tr4(nu_wa,nu_wa))
    !
    !Building up Hamiltonian matrix
    DO i = 1, nu_kp, 1
       tr1 = CMPLX(0.0d0, 0.0d0)
       DO j = 1, nu_ws, 1
          tr2 = CMPLX(0.0d0, 0.0d0)
          DO k = 1, nu_wa**2, 1
             l = hr(k+(j-1)*nu_wa**2,4)
             m = hr(k+(j-1)*nu_wa**2,5)
             dc(2) = CMPLX(hr(k+(j-1)*nu_wa**2,6), hr(k+(j-1)*nu_wa**2,7))
             tr2(l,m) = EXP(dc(1) * (kp(i,1)*hr(k+(j-1)*nu_wa**2,1)&
                                     +kp(i,2)*hr(k+(j-1)*nu_wa**2,2)&
                                     +kp(i,3)*hr(k+(j-1)*nu_wa**2,3)))&
                        * dc(2)
          END DO
          tr2 = tr2 / DBLE(nd(j))
          tr1 = tr1 + tr2
       END DO
       DO j = 1, nu_wa, 1
          l = j + (i-1) * nu_wa
          DO k = 1, nu_wa, 1
             tb(l,k) = tb(l,k) + tr1(j,k)
          END DO
       END DO
    END DO
    !
    !Initialising the array to store the Eigen vector matrix
    ec = CMPLX(0.0d0, 0.0d0)
    !
    !Initialising the array to store the Eigen value
    ev = 0.0d0
    !
    !Setting up all parameters used by ZHEEV subroutine
    n = nu_wa
    lda = nu_wa
    ALLOCATE (a(nu_wa,nu_wa))                             !Transitting Eigen vector matrix
    ALLOCATE (w(nu_wa))                                   !Transitting Eigen value
    ALLOCATE (work(2*nu_wa-1))
    lwork = 2 * nu_wa - 1
    ALLOCATE (rwork(3*nu_wa-2))
    !
    !Computing Hamiltonian matrix, Eigen vector matrix and Eigen value on each k point
    DO i = 1, nu_kp, 1
       !Initialising parameters used by ZHEEV subroutine
       a = CMPLX(0.0d0, 0.0d0)
       w = 0.0d0
       work = CMPLX(0.0d0, 0.0d0)
       rwork = 0.0d0
       !
       DO j = 1, nu_wa, 1
          a(j,:) = tb(j+(i-1)*nu_wa,:)
       END DO
       CALL ZHEEV('V','L',n,a,lda,w,work,lwork,rwork,info)
       DO j = 1, nu_wa, 1
          ec(1+(i-1)*nu_wa:i*nu_wa,j) = a(:,j)
       END DO
       ev(i,:) = w
    END DO
    !
    !Computing the velocity of the Hamiltonian matrix
    vh = CMPLX(0.0d0, 0.0d0)
    DO i = 1, nu_kp, 1
       tr1 = CMPLX(0.0d0, 0.0d0)
       tr2 = CMPLX(0.0d0, 0.0d0)
       DO j = 1, nu_ws, 1
          tr3 = CMPLX(0.0d0, 0.0d0)
          tr4 = CMPLX(0.0d0, 0.0d0)
          DO k = 1, nu_wa**2, 1
             l = hr(k+(j-1)*nu_wa**2,4)
             m = hr(k+(j-1)*nu_wa**2,5)
             dc(2) = CMPLX(hr(k+(j-1)*nu_wa**2,6), hr(k+(j-1)*nu_wa**2,7))
             !vx
             dc(3) = CMPLX(hr(k+(j-1)*nu_wa**2,1), 0.0d0)
             tr3(l,m) = EXP(dc(1) * (kp(i,1)*hr(k+(j-1)*nu_wa**2,1)&
                                     +kp(i,2)*hr(k+(j-1)*nu_wa**2,2)&
                                     +kp(i,3)*hr(k+(j-1)*nu_wa**2,3)))&
                        * dc(2) * dc(1) * dc(3)
             !
             !Vy
             dc(3) = CMPLX(hr(k+(j-1)*nu_wa**2,2), 0.0d0)
             tr4(l,m) = EXP(dc(1) * (kp(i,1)*hr(k+(j-1)*nu_wa**2,1)&
                                     +kp(i,2)*hr(k+(j-1)*nu_wa**2,2)&
                                     +kp(i,3)*hr(k+(j-1)*nu_wa**2,3)))&
                        * dc(2) * dc(1) * dc(3)
             !
          END DO
          tr3 = tr3 / DBLE(nd(j))
          tr4 = tr4 / DBLE(nd(j))
          !Vx
          tr1 = tr1 + tr3
          !
          !Vy
          tr2 = tr2 + tr3
          !
       END DO
       DO j = 1, nu_wa, 1
          l = j + (i-1) * nu_wa
          DO k = 1, nu_wa, 1
             vh(l,k) = vh(l,k) + tr1(j,k)
             vh(l,k+nu_wa) = vh(l,k+nu_wa) + tr2(j,k)
          END DO
       END DO
    END DO
    !
    !Computing the conductivity
    !
    !Allocating the arrays that store the eigen vector, velocity of Hamiltonian and normalising prefactor
    ALLOCATE (mt1(1,nu_wa))
    ALLOCATE (mt2(nu_wa,1))
    ALLOCATE (mt3(1,nu_wa))
    ALLOCATE (mt4(nu_wa,1))
    ALLOCATE (mt5(nu_wa,nu_wa))
    ALLOCATE (nm(nu_wa))
    !
    !Initialising the array that stores the conductivity values on all km(1) k points
    sc = CMPLX(0.0d0, 0.0d0)
    !
    !Computing the conductivity
    DO i = 1, nu_kp, 1
       !Normalized factor part
       DO j = 1, nu_wa, 1
          mt1(1,:) = DCONJG(ec(1+(i-1)*nu_wa:i*nu_wa,j))
          mt2(:,1) = ec(1+(i-1)*nu_wa:i*nu_wa,j)
          nm(j) = REAL(SUM(MATMUL(mt1,mt2)))
          WRITE (UNIT=21, FMT=*) SUM(MATMUL(mt1,mt2))
          nm(j) = 1.0d0 / DSQRT(nm(j))
       END DO
       !
       !Velocity of Hamiltonian
       IF (vd .EQ. 0) THEN
           mt5 = vh(1+(i-1)*nu_wa:i*nu_wa,1:nu_wa)
       ELSE
           mt5 = vh(1+(i-1)*nu_wa:i*nu_wa,1+nu_wa:2*nu_wa)
       END IF
       !
       !Conductivity part
       oc = CMPLX(0.0d0, 0.0d0)
       DO j = 1, nu_wa, 1
          gr(1) = CMPLX (1.0d0, 0.0d0) / CMPLX(fermi - ev(i,j), bv)
          ga(1) = CMPLX (1.0d0, 0.0d0) / CMPLX(fermi - ev(i,j), 0.0d0 - bv)
          gd(1) = gr(1) - ga(1)
          mt1(1,:) = DCONJG(ec(1+(i-1)*nu_wa:i*nu_wa,j))
          mt2(:,1) = ec(1+(i-1)*nu_wa:i*nu_wa,j)
          DO k = 1, nu_wa, 1
             gr(2) = CMPLX (1.0d0, 0.0d0) / CMPLX(fermi - ev(i,k), bv)
             ga(2) = CMPLX (1.0d0, 0.0d0) / CMPLX(fermi - ev(i,k), 0.0d0 - bv)
             gd(2) = gr(2) - ga(2)
             mt3(1,:) = DCONJG(ec(1+(i-1)*nu_wa:i*nu_wa,k))
             mt4(:,1) = ec(1+(i-1)*nu_wa:i*nu_wa,k)
             oc(1) = SUM(MATMUL(mt1,MATMUL(mt5,mt4)))*SUM(MATMUL(mt3,MATMUL(mt5,mt2)))*&
                     gd(1)*gd(2)*nm(j)*nm(j)*nm(k)*nm(k)*dk(1)*dk(2)
             write(unit=22,fmt=*) SUM(MATMUL(mt1,MATMUL(mt5,mt4))), SUM(MATMUL(mt3,MATMUL(mt5,mt2)))
             write(unit=25,fmt=*) gr(1),ga(1),gr(2),ga(2)
             write(unit=26,fmt=*) nm(j), nm(k), dk(1), dk(2)
             oc(2) = oc(2) + oc(1)
          END DO
       END DO
       sc = sc + oc(2)
       write(unit=23,fmt=*) oc(2),sc
       !
    END DO
    !
    !Deallocating arrays used for transitting Hamiltonian
    DEALLOCATE (tr1)
    DEALLOCATE (tr2)
    DEALLOCATE (tr3)
    DEALLOCATE (tr4)
    !
    !Deallocating arrays used by ZHEEV subroutine
    DEALLOCATE (a)
    DEALLOCATE (w)
    DEALLOCATE (rwork)
    DEALLOCATE (work)
    !
    !Deallocating arrays used to transitting eigen vectors
    DEALLOCATE (mt1)
    DEALLOCATE (mt2)
    DEALLOCATE (mt3)
    DEALLOCATE (mt4)
    !
    !Deallocating array used to transitting velocity of Hamiltonian
    DEALLOCATE (mt5)
    !
    !Deallocating array used to store the normalising prefactor
    DEALLOCATE (nm)
    !
    RETURN
    END SUBROUTINE HAMCON
    !
    END MODULE CAL
Kieran
  • 123
  • 1
  • 10
  • A quick visual grep tells me you have only allocated ka on rank zero and non of the other processes - there may be other issues. And why have you got a barrier just before the bcasts? The bcasts are already blocking. – Ian Bush Mar 29 '20 at 20:09
  • Thank you for the reply. I allocate ka on rank zero and broadcast it to other ranks. I did the same thing to all other variables; e.g. rea_c or fermi variables. Is this the problem? Regarding barrier, I just want to make sure all ranks reach the same time point so that I can broadcasting the variables to all other ranks. Is this also a problem? – Kieran Mar 29 '20 at 21:32
  • If you don't allocate the memory for the data on the other ranks, where can they store that data when they receive it? And the barrier is unnecessary as blocking collectives (e.g. bcast, reduce, barrier etc.) will not complete until all processes have done what they need to do. Unnecessary barriers are one of the classic causes of bad performance in MPI, don;t get into a bad habit. – Ian Bush Mar 29 '20 at 21:49
  • Unnecessary `MPI_Barrier()` are definitely bad. That being said, " blocking collectives (e.g. bcast, reduce, barrier etc.) will not complete until all processes have done what they need to do" should not be interpreted as these blocking collectives perform an implicit barrier. (`MPI_Allreduce()` does though) – Gilles Gouaillardet Mar 30 '20 at 00:04
  • 1
    OK, with language lawyer hat on yes, there is no guarantee that processes are synchronised on exit from a blocking collective. But the important point is they will not return locally until all the data transfer required by the local process is complete. Thus there is no need to provide a barrier either on entry to, or exit from, any blocking collective. – Ian Bush Mar 30 '20 at 07:01
  • Dear Ian Bush and Gilles Gouaillardet, Thank you for the suggestions. I have already moved ka array outside the IF sentence and allocate it in all ranks. I also delete the barrier sentences. However, I still get error message. I update the new modified conductivityMAINp.f90 and the new error message in my question. The conductivityCALp.f90 remains same. Would you please give me some suggestions on the solution? Thank you so much. – Kieran Mar 30 '20 at 10:08
  • 1
    You are still allocating nd only on rank zero and then broadcasting it. You also need to allocate ct_all on all processes - see https://stackoverflow.com/questions/13496510/is-there-anything-wrong-with-passing-an-unallocated-array-to-a-routine-without-a/13496808#13496808 for why and a solution. You also still have some necessary barriers. Haven't compiled and run your program so there may be other problems – Ian Bush Mar 30 '20 at 10:17
  • Dear Ian Bush, Thank you very much for the suggestion. I finally got my parallel code work. I update my new parallel code with some more questions in my post. Would you please have a look at them and give me some answers? Thank you so much. – Kieran Apr 01 '20 at 15:58

0 Answers0