5

Here is my pbs file:

#!/bin/bash 
#PBS -N myJob 
#PBS -j oe
#PBS -k o 
#PBS -V
#PBS -l nodes=hpg6-15:ppn=12
cd ${PBS_O_WORKDIR}

./mycommand

On qsub documentation page, it seems like if I put the line PBS -k o, I should be able to check the real time output in a file named myJob.oJOBID in my home dir. However when I check the output by tail -f or cat or more in runtime, it shows nothing in the file. Only when I terminated the job, then the file would show the output. Is there anything I should check to make the stream flush to the output file in real time?

2 Answers2

1

By default, the files are created on the nodes and copied to your home directory when the job completes. The cluster admin can change this behavior by adding "$spool_as_final_name true" to the config file in the mom_priv directory on each node.

Torque MOM Configuration, parameters

chuck
  • 735
  • 3
  • 4
1

Assuming you are allowed to login to the node running your process (this is allowed by the admin of our cluster for the duration of the job, not sure if this is common or not), then you can have real-time output by

  1. Getting the PID of your process
  2. Browsing through the files that this process has opened with lsof -n -p <PID>, and finding the file whose name "looks like" that of a log. In our cluster the files are
/cm/local/apps/pbspro-ce/var/spool/spool/[JOBID][server].OU
/cm/local/apps/pbspro-ce/var/spool/spool/[JOBID][server].ER

The .OU is stdout and the .ER is stderr. You can then tail -f to get real time output.

The output of lsof can be pretty long though, so you should try grepping your JOBID, or maybe this bit pbspro-ce/var/spool/.

Curious to know if this can be replicated in clusters other than our own.

mbiron
  • 3,933
  • 1
  • 14
  • 16