1

I am trying to submit a job in a cluster in our institute using python scripts.

 compile_cmd = 'ifort -openmp ran_numbers.f90 ' + fname \
                  + ' ompscmf.f90 -o scmf.o'
 subprocess.Popen(compile_cmd, shell=True)

 Popen('qsub launcher',shell=True)

The problem is that , system is hanging at this point. Any obvious mistakes in the above script? All the files mentioned in the code are available in that directory ( I have cross checked that). qsub is a command used to submit jobs to our cluster. fname is the name of a file that I created in the process.

Vaidyanathan
  • 379
  • 3
  • 6
  • 16

2 Answers2

6

I have a script that I used to submit multiple jobs to our cluster using qsub. qsub typically takes job submissions in the form

qsub [qsub options] job

In my line of work, job is typically a bash (.sh) or python script (.py) that actually calls the programs or code to be run on each node. If I wanted to submit a job called "test_job.sh" with maximum walltime, I would do

qsub -l walltime=72:00:00 test_job.sh

This amounts to the following python code

from subprocess import call

qsub_call = "qsub -l walltime=72:00:00 %s"
call(qsub_call % "test_job.sh", shell=True)

Alternatively, what if you had a bash script that looked like

#!/bin/bash

filename="your_filename_here"
ifort -openmp ran_numbers.f90 $filename ompscmf.f90 -o scmf.o

then submitted this via qsub job.sh?


Edit: Honestly, the most optimal job queueing scheme varies from cluster to cluster. One simple way to simplify you job submissions scripts is to find out how many CPUs are available at each node. Some of the more recent queueing systems allow you to submit many single CPU jobs and they will submit these together on as few nodes as possible; however, some older clusters won't do that and submitting many individual jobs is frowned upon.

Say that each node in your cluster has 8 CPUs. You could write you script like

#!/bin/bash
#PBS -l nodes=1;ppn=8

for ((i=0; i<8; i++))
do
    ./myjob.sh filename_${i} &
done
wait

What this will do is submit 8 jobs on one node at once (& means do in background) and wait until all 8 jobs are finished. This may be optimal for clusters with many CPUs per node (for example, one cluster that I used has 48 CPUs per node).

Alternatively, if submitting many single core jobs is optimal and your submission code above isn't working, you could use python to generate bash scripts to pass to qsub.

#!/usr/bin/env python
import os
from subprocess import call

bash_lines = ['#!/bin/bash\n', '#PBS -l nodes=1;ppn=1\n']
bash_name = 'myjob_%i.sh'
job_call = 'ifort -openmp ran_numbers.f90 %s ompscmf.f90 -o scmf.o &\n'
qsub_call = 'qsub myjob_%i.sh'

filenames = [os.path.join(root, f) for root, _, files in os.walk(directory)
                                   for f in files if f.endswith('.txt')]
for i, filename in enumerate(filenames):
    with open(bash_name%i, 'w') as bash_file:
        bash_file.writelines(bash_lines + [job_call%filename, 'wait\n'])
    call(qsub_call%i, shell=True)
wflynny
  • 18,065
  • 5
  • 46
  • 67
  • `Popen` is more generalized than `call`, so both should work. However, I find `call` easier to use with simply commands that I need done immediately because [`call` does block](http://stackoverflow.com/questions/7681715/whats-the-difference-between-subprocess-popen-and-call-how-do-you-use-them-to). However, I'm still not sure why you are using python to submit jobs. Are you iterating over many `fnames` and submitting many individual jobs? That seems less than optimal for most clusters and queueing systems and there's probably a better way. – wflynny Oct 17 '13 at 17:10
  • Yes, I am iterating over many `fnames` and submitting many individual jobs. I am not aware of any optimal methods. Could you please give a guideline along those lines? – Vaidyanathan Oct 17 '13 at 17:20
  • I added some additional comments in the OP. You may want to contact whoever runs the cluster for some guidelines and best practices. – wflynny Oct 17 '13 at 18:32
  • 2
    Instead of hardcoding the 8 cores in your for-loop, you can use `for ((i=0; i<${PBS_NUM_PPN}; i++))`. – Julian Helfferich Dec 21 '16 at 14:48
0

Did you get any errors. Because it seems you missed the "subprocess." at the second Popen.

Leo
  • 1,273
  • 9
  • 14