0

I have a Bash script that contains three different functions that executes based on the information in samplesheet.txt. Which of the three functions to run is determined by the column Group in samplesheet.txt. All lines in samplesheet that have Group=Both, should be run using process_both. Lines with Group=DNA should run the function process_dna and lines with Group=RNA should run the function process_rna.

How do I specify in my Bash script which function to be run based not he information in samplesheet?

samplesheet.txt:

DNA             RNA            purity    Group
sample_DNA_1    sample_RNA_1    0.8      Both
sample_DNA_2    sample_RNA_2    0.5      Both
NA              sample_RNA_4    0.3      RNA
sample_DNA_3    NA              0.1     DNA

Code:

process_both() {
    local dna=$1
    local rna=$2
    local purity=$3
        singularity exec 
        sing.sif
        ...
        ...
        dna_id {dna}
        rna_id {rna}
        purity {purity}
        ...

        }
    
process_dna() {
    local dna=$1
    local rna=$2
    local purity=$3
        singularity exec 
        sing.sif
        ...
        ...
        dna_id {dna}
        purity {purity}
        ...

        }

process_rna() {
    local dna=$1
    local rna=$2
    local purity=$3
        singularity exec 
        sing.sif
        ...
        ...
        rna_id {rna}
        purity {purity}
        ...

        }


# Run the functions

tail +2 samplesheet.txt | while read dna rna purity
do
    process_both "$dna" "$rna" "$purity"
done

tail +2 samplesheet.txt | while read dna rna purity
do
    process_rna "$dna" "$rna" "$purity"
done    

tail +2 samplesheet.txt | while read dna rna purity
do
    process_dna "$dna" "$rna" "$purity"
done
tripleee
  • 175,061
  • 34
  • 275
  • 318
user2300940
  • 2,355
  • 1
  • 22
  • 35

1 Answers1

2

Use a simple conditional.

tail -n +2 samplesheet.txt | while read -r dna rna purity group
do
    case $group in
      'Both')
        process_both "$dna" "$rna" "$purity";;
      'RNA')
        process_rna "$dna" "$rna" "$purity";;
      'DNA')
        process_dna "$dna" "$rna" "$purity";;
      *)
        echo "$0: could not process $dna $rna $purity $group" >&2 ;;
    esac
done <samplesheet.txt

You could use if/then/elif/then/elif/then but a case statement is easier to read and update. Don't let the unusual syntax scare you. Be mindful of the double semicolons which terminate each case.

Tangentially, for robustness, use tail -n and read -r. Also, your function calls seem to lack the dollar sign before the variables, and properly speaking should also quote them.

(I suppose the functions could be simplified to a single one which simply does not pass an argument if that variable is empty.

process () {
    dna=$1
    rna=$2
    purity=$3

    singularity exec sing.sif ... \
        ${dna:+dna_id "$dna"} ${rna:+rna_id "$rna"} \
        purity "$purity" ...
}

:
tail -n +2 samplesheet.txt | while read -r dna rna purity group
do    
    case $group in
      'Both') ;;  # do nothing
      'RNA') dna="";;
      'DNA') rna="";;
      *)
        echo "$0: could not process $dna $rna $purity $group" >&2 ;;
    esac
    process "$dna" "$rna" "$purity"
done <samplesheet.txt

... but of course, if there are other differences which your abridged code does not reveal, you can't do this refactoring so easily.

Actually, now that the function is this simple, you could probably just inline it into the loop.)

tripleee
  • 175,061
  • 34
  • 275
  • 318