2

I'm completely new to using HPCs and SLURM, so I'd really appreciate some guidance here.

I need to iteratively run a command that looks like this

kallisto quant -i '/home/myName/genomes/hSapien.idx' \
               -o "output-SRR3225412"                 \
                         "SRR3225412_1.fastq.gz"       \
                         "SRR3225412_2.fastq.gz"

where the SRR3225412 part will be different in each interation

The problem is, as I found out, I can't just append this to the end of an sbatch command

sbatch --nodes=1          \
       --ntasks-per-node=1 \
       --cpus-per-task=1    \
         kallisto quant -i '/home/myName/genomes/hSapien.idx' \
                        -o "output-SRR3225412"                 \
                                  "SRR3225412_1.fastq.gz"       \
                                  "SRR3225412_2.fastq.gz"

This command doesn't work. I get the error

sbatch: error: This does not look like a batch script.  The first
sbatch: error: line must start with #! followed by the path to an interpreter.
sbatch: error: For instance: #!/bin/sh

I wanted to ask, how do I run the sbatch command, specifying its run parameters, and also adding the command-line arguments for the kallisto program I'm trying to use? In the end I'd like to have something like

#!/bin/bash

for sample in ...
do
    sbatch --nodes=1          \
           --ntasks-per-node=1 \
           --cpus-per-task=1    \
             kallistoCommandOnSample --arg1 a1 \
                                     --arg2 a2 arg3 a3
done
user3666197
  • 1
  • 6
  • 50
  • 92
Zuhaib Ahmed
  • 487
  • 4
  • 14
  • Does this answer your question? [Running a binary without a top level script in SLURM](https://stackoverflow.com/questions/33400769/running-a-binary-without-a-top-level-script-in-slurm) – Carles Fenoy Oct 28 '20 at 16:15

1 Answers1

3

The error sbatch: error: This does not look like a batch script. is because sbatch expect a submission script. It is a batch script, typically a Bash script, in which comments starting with #SBATCH are interpreted by Slurm as options.

So the typical way of submitting a job is to create a file, let's name it submit.sh:

#! /bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=1

kallisto quant -i '/home/myName/genomes/hSapien.idx' \
               -o "output-SRR3225412"                 \
                         "SRR3225412_1.fastq.gz"       \
                         "SRR3225412_2.fastq.gz"

and then submit it with

sbatch submit.sh

If you have multiple similar jobs to submit, it is beneficial for several reasons to use a job array. The loop you want to create can be replaced with a single submission script looking like

#! /bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=1
#SBATCH --array=1-10 # Replace here with the number of iterations in the loop

SAMPLES=(...) # here put what you would loop over
CURRSAMPLE=${SAMPLE[$SLURM_ARRAY_TASK_ID]}
kallisto quant -i '/home/myName/genomes/hSapien.idx' \
               -o "output-${CURRSAMPLE}"              \
                         "${CURRSAMPLE}_1.fastq.gz"    \
                         "${CURRSAMPLE}_2.fastq.gz"

As pointed out by @Carles Fenoy, if you do not want to use a submission script, you can use the --wrap parameter of sbatch:

sbatch --nodes=1          \
       --ntasks-per-node=1 \
       --cpus-per-task=1    \
       --wrap "kallisto quant -i '/home/myName/genomes/hSapien.idx' \
                              -o 'output-SRR3225412'                 \
                                        'SRR3225412_1.fastq.gz'       \
                                        'SRR3225412_2.fastq.gz'"
user3666197
  • 1
  • 6
  • 50
  • 92
damienfrancois
  • 52,978
  • 9
  • 96
  • 110