5

I am looking for some general advice rather than a coding solution. Basically when submitting a job via bsub I can retrieve a log of the Stdin/Stdout by specifying any of the following:

bsub -o log.txt      % sends StdOut to log.txt
bsub -u me@email     % sends StdOut to email

these are both great, but my program creates a folder when submitted to bsub and is stored on the remote server. essentially I want to

a) retrieve the folder and it's contents b) do this automatically when the job finishes

so I could technically to a by using scp -r, however I would have to do this manually. not too bad if I get an email alert when the job is finished, but still - I'd have to manually do this.

so onto b):

well I can't see any special flag for bsub to retreive the actual results, only StdOut. I suppose I could have a script which uses sleep and sets to the job time (perhaps a bit linger just to be safe), something like

#!/bin/bash

scp myfile.txt server:main/subfolder
ssh bsub < myprogram.sh -u my@email
sleep <job-time>
scp -r server:main/subfolder result_folder 

however I am slightly concerned about being logged out etc and the script terminating before the job is finished.

does anyone have any suggestions?

I essentially want to have a interface (website in future) where user can submit a file, file is analysed remotely, user is sent emails when job starts/finishes, results automatically retrieved back to local/webserver, user gets email saying they can pick up their results.

one step at a time though!

brucezepplin
  • 9,202
  • 26
  • 76
  • 129
  • If the execution nodes have access to shared storage you could do the copy inside the job script. If your cluster admin doesn't want you to hold the cpu while doing an I/O task, you could do the file copy as a [post exec command](http://www-01.ibm.com/support/knowledgecenter/SSETD4_9.1.3/lsf_admin/pre_post_exec_commands.dita) (e.g. bsub -Ep). If the execution node doesn't have access to suitable shared storage, LSF has a feature to [copy the output back to the submission node](http://www-01.ibm.com/support/knowledgecenter/SSETD4_9.1.3/lsf_users_guide/non_shared_about.dita). – Michael Closson Jul 14 '15 at 14:19
  • 2
    If the system has Platform Data Manager for LSF installed, you could stage the data out from within the job. – Hristo Iliev Jul 14 '15 at 20:12

2 Answers2

1

You can tar your results directory to stdout, into your logfile. Then un-tar the logfile to retrieve the directory.

Add the tar czf - ... command to the end of your script.

If you have other stuff appearing on stdout first, move it to stderr instead, or echo some unique string before the tar, grep for it, and tar from there. Here's a sort of test of the principle:

marker='#magic' # some unique string
log=/tmp/b # your logfile
echo 'test' >/tmp/a # just something to tar for this test

# -- in your script, at end --
#  echo "$marker"; tar cf - /tmp/a
# -- equivalent in this test:
(echo 'hello'; echo "$marker"; tar cf - /tmp/a) >$log

# -- to recover the tar --
start=$(grep -ab "$marker" <$log | awk -F: '{print 1+$1+length($2)}')
dd skip=1 bs=$start <$log |
tar tvf - # use tar x really
meuh
  • 11,500
  • 2
  • 29
  • 45
  • sorry @meuh - I am trying to get this to work. I can see I am able to create /tmp/a and tmp/b in which they contain the words "hello" and "magic". However in LSF there is an option to email StdOut from the program. ARe you saying it is possible to tar a results directory that the program creates, embed it into StdOut so that it gets emailed, and then untar the results from client side to retrieve the results directory? – brucezepplin Jul 27 '15 at 11:09
  • in principle it is possible. obviously piping stuff into an email program will be limited by what that program is willing to pass, in size and contents. You may need to encode the binary output of tar through `base64` or similar encoder. And if there is too much data, it will probably truncate it. – meuh Jul 27 '15 at 11:27
1

You can submit the job in blocking mode (bsub -K). This makes the bsub command return only when the job is complete or an error was found.

Quote from documentation:

-K

Submits a job and waits for the job to complete. Sends the message "Waiting for dispatch" to the terminal when you submit the job. Sends the message "Job is finished" to the terminal when the job is done. If LSB_SUBK_SHOW_EXEC_HOST is enabled in lsf.conf, also sends the message "Starting on execution_host" when the job starts running on the execution host.

You are not able to submit another job until the job is completed. This is useful when completion of the job is required to proceed, such as a job script. If the job needs to be rerun due to transient failures, bsub returns after the job finishes successfully. bsub exits with the same exit code as the job so that job scripts can take appropriate actions based on the exit codes. bsub exits with value 126 if the job was terminated while pending.

You cannot use the -K option with the -I, -Ip, or -Is options.

Next, you could run scp or a similar program to automatically copy the results from the remote host without checking your email. :)

You could also prefix your wrapper script with nohup to prevent it from being killed if the session logs out.

Gowtham
  • 1,465
  • 1
  • 15
  • 26