0

To start, I am relatively new to shell scripting. I was wondering if anyone could help me create "steps" within a bash script. For example, I'd like to run one analysis and then have the script proceed to the next analysis with the output files generated in the first analysis.

So for example, the script below will generate output file "filt_C2":

./sortmerna --ref ./rRNA_databases/silva-arc-23s-id98.fasta,./index/silva-arc-23s-id98.db:./rRNA_databases/silva-bac-23s-id98.fasta,./index/silva-bac-23s-id98.db:./rRNA_databases/silva-euk-18s-id95.fasta,./index/silva-euk-18s-id95.db:./rRNA_databases/silva-euk-28s-id98.fasta,./index/silva-euk-28s-id98.db:./rRNA_databases/rfam-5s-database-id98.fasta,./index/rfam-5s-database-id98.db:./rRNA_databases/rfam-5.8s-database-id98.fasta,./index/rfam-5.8s.db --reads ~/path/to/file/C2.fastq --aligned ~/path/to/file/rrna_C2 --num_alignments 1 --other **~/path/to/file/filt_C2** --fastx --log -a 8 -m 64000

Once this step is complete, I would like to run another step that will use the output file "filt_C2" that was generated. I have been creating multiple bash scripts for each step; however, it would be more efficient if I could do each step in one bash file. So, is there a way to make a script that will complete Step 1, then move to Step 2 using the files generated in step 1? Any tips would be greatly appreciated. Thank you!

1 Answers1

2

Welcome to bash scripting!

Here are a few tips:

  1. You can have multiple lines, as many as you like, in a bash script file.
  2. You may call other bash scripts (or any other executable programs) from within your shell script, just as Frank has mentioned in his answer.
  3. You may use variables to make your script more generic, say, if you want to name your result "C3" instead of "C2". (Not shown below)
  4. You may use bash functions if your script becomes more complicated, e.g. see https://ryanstutorials.net/bash-scripting-tutorial/bash-functions.php
  5. I recommend placing sortmerna in a directory that is in your environmental PATH variable, and to replace the multiple ~/path/to/file to another variable (say WORKDIR) for consistency and flexibility.

For example, let’s say you name your script print_analysis.sh:

#!/bin/bash

# print_analysis.sh
# Written by Nikki E. Andrzejczyk, November 2018

# Set variables
WORKDIR=~/path/to/file

# Stage 1: Generate filt_C2 using SortMeRNA
./sortmerna --ref ./rRNA_databases/silva-arc-23s-id98.fasta,./index/silva-arc-23s-id98.db:./rRNA_databases/silva-bac-23s-id98.fasta,./index/silva-bac-23s-id98.db:./rRNA_databases/silva-euk-18s-id95.fasta,./index/silva-euk-18s-id95.db:./rRNA_databases/silva-euk-28s-id98.fasta,./index/silva-euk-28s-id98.db:./rRNA_databases/rfam-5s-database-id98.fasta,./index/rfam-5s-database-id98.db:./rRNA_databases/rfam-5.8s-database-id98.fasta,./index/rfam-5.8s.db \
            --reads "$WORKDIR/C2.fastq" \
            --aligned "$WORKDIR/rrna_C2" \
            --num_alignments 1 \
            --other "$WORKDIR/filt_C2" \
            --fastx --log -a 8 -m 64000

# Stage 2: Process filt_C2 to generate result_C2
./stage2 "$WORKDIR/filt_C2" > "$WORKDIR/result_C2.txt"

# Stage 3: Print the result in result_C2
less "$WORKDIR/result_C2.txt"

Note how I use trailing backslash \ so that I could split the long sortmerna command into multiple shorter lines, and the use of # for human-readable comments.

There is still room for improvement as mentioned above but not implemented in this quick example, but hope this quick example shows you how to expand your bash script and make it do multiple steps in one go.

Bash is actually a very powerful scripting and programming language. To learn more, you may want to start with Bash tutorials like the following:

Hope this helps! If you have any other questions, or if I had misunderstood your question, please feel free to ask!

Cheers,

Anthony

Gordon Davisson
  • 118,432
  • 16
  • 123
  • 151
Anthony Fok
  • 341
  • 1
  • 5
  • 2
    There's one warning I would add: if any command ("step") in the script fails, it'll blindly continue with the rest of it, even if that doesn't make sense. (First analysis failed? Let's go ahead and run the second analysis on its nonexistent/empty output files!) If you add `set -e` at the beginning of the script (or change the first line to "`#!/bin/bash -e`"), it'll exit if any command fails (with some confusing exceptions, but that's an advanced topic...) This is generally safer, especially when you're starting out. – Gordon Davisson Nov 29 '18 at 22:37
  • 2
    Oh, one other note: shell scripting is easy to get started at, but there are also some bad traps that're easy to fall into. [shellcheck.net](https://www.shellcheck.net) is good at spotting common mistakes, so I recommend running your scripts through it as a sanity-check. (Speaking of which, I'm about to make a minor edit to this answer to fix one of the really common mistakes -- variable references without double-quotes around them.) – Gordon Davisson Nov 29 '18 at 23:39
  • Thanks for the helpful advice! Thats exactly what I was looking for. Also..while I'm here, I might as well ask. Say I have multiple files I'd like to analyze - so other than C2.fastq (from above), I also have C1.fastq. I know I can specify that I'd like to tell the script to use both of these using C*.fastq. However, how can I change the output file name, filt_C2.fastq from above, to reflect the name of the initial input files? I know this is slightly off topic, but if you could point me towards the right direction that would be great. – Nikki E. Andrzejczyk Nov 30 '18 at 19:28