1

I am trying to upload this job via a .sh script to a cluster with SLURM, using the COMSOL software:

#!/bin/bash  
#SBATCH --job-name=my_work
#SBATCH --nodes=1 
#SBATCH --ntasks-per-node=20   
#SBATCH --mem=20G
#SBATCH --partition=my_partition
#SBATCH --time=4-0 
#SBATCH --no-requeue  
#SBATCH --exclusive       
#SBATCH -D $HOME 
#SBATCH --output=Lecho1_%j.out
#SBATCH --error=Lecho1_%j.err

cd /home/myuser/myfile/
module load intel/2019b
module load OpenMPI/4.1.1
module load COMSOL/5.5.0

comsol batch -mpibootstrap slurm -nn 20 -nnhost 20 -inputfile myfile.mph -outputfile 
myfile.outout.mph -study std1 -batchlog myfile.mph.log

and when doing so I get the following error message:

Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(805): fail failed
MPID_Init(1743)......: channel initialization failed
MPID_Init(2137)......: PMI_Init returned -1

Can anyone tell me what it means and how to fix it completely?

Squashman
  • 13,649
  • 5
  • 27
  • 36
  • 2
    I am not quite sure of what COMSOL does under the hood, but from the script and log, you are using `OpenMPI` module but the error message (from COMSOL) is using MPICH (or its derivative). Check the COMSOL documentation to figure out which MPI library to use (that could be Intel MPI, that is a MPICH derivative). – Gilles Gouaillardet Sep 15 '21 at 02:34

1 Answers1

1

The way you call COMSOL is incorrect. Submission script should contain the following lines to run COMSOL in a cluster with SLURM:

#!/bin/bash
#SBATCH --partition=regular
#SBATCH --job-name=COMSOL_JOB
#SBATCH --mem=200gb
#SBATCH --cpus-per-task=1
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=48
#SBATCH --output=%x-%j.out
#SBATCH --error=%x-%j.err

module load COMSOL/5.5

comsol batch -mpirmk pbs -job b1 -alivetime 15 -recover \
-inputfile "mymodel.mph" -outputfile "mymodel.mph.out" \
-batchlog  "mymodel.mph.log"
rahjoo
  • 31
  • 4