-1

I have a bash-script that takes two input samples (IPDID_DNA and IPDID_RNA) and a purity value (purity) which are analysed in pairs. I would like to make a loop based on a samplesheet (samplesheet.txt) that loops throuigh the pairs which are specified in rows. Tips on how to automate this in a loop?

My input samplesheet could look something like this:

dna_sample_id   rna_tumor_sample_id purity
Sample1_DNA   Sample1_RNA   0.9
Sample2_DNA   Sample2_RNA   0.1

This is how it looks when inputtin two samples manually for one run where I specify IPDID_DNA and IPDID_RNA and purity manually:

IPDID_DNA="Sample1_DNA"
IPDID_RNA="Sample1_RNA"
IPDID_Folder="IPDID_DNA"

dna_sample_id=${IPDID_DNA}
dna_sample_pair_id=${IPDID_DNA}
dna_sample_output_id=${IPDID_DNA}

rna_tumor_sample_id=${IPDID_RNA}
rna_tumor_sample_pair_id=${IPDID_RNA}
rna_tumor_sample_output_id=${IPDID_RNA}

purity="0.9"
      singularity exec \
        --no-home \
        -B /data:/data \
        -W /data \
         docker.sif \
         bash process.sh \
            --output_directory ${IPDID_Folder} \
            --reference_fasta_file genome.fa \
        --dna_tumor_id ${dna_sample_id} \
        --dna_tumor_pair_id ${dna_sample_pair_id} \
        --dna_tumor_output_id ${dna_sample_output_id} \
        --dna_tumor_purity ${purity} \
            --rna_tumor_id ${rna_tumor_sample_id} \
          --rna_tumor_pair_id ${rna_tumor_sample_pair_id} \
          --rna_tumor_output_id ${rna_tumor_sample_output_id} \
         --rna_tumor_localapp_run_directory  ${ANALYSIS_OUTPUT_DIR}  
user2300940
  • 2,355
  • 1
  • 22
  • 35
  • 1
    As written you are passing off your work to us. It's confusing that your input doesn't match your output. You have two records in your input, and first record seem to correspond to the output but different purity. – Allan Wind Apr 21 '23 at 07:02
  • Do you want to generate the output (as files) or execute the output? In any case here is how you do it... make the sample a `function good_name() { IPDID_DNA=$1; IPDID_RNA=$2; purity=$3; # other than purity what you have now}`. Then write some code that calls your function once per row. while read dna rna purity; do good_name "$dna" "$rna" "$purity"; done <(tail +2 samplesheet) – Allan Wind Apr 21 '23 at 07:05
  • Thanks, the scripts runs a docker that generates an output-directory matching the name in dna_sample_output_id. – user2300940 Apr 21 '23 at 07:18
  • So you want to execute that (not generate a file that you can execute)? – Allan Wind Apr 21 '23 at 07:21
  • Yes, but should not matter for how the loop is made? Updated input with how it actually looks – user2300940 Apr 21 '23 at 07:22
  • One runs your program the other generates a text file. – Allan Wind Apr 21 '23 at 07:23

1 Answers1

2

I suggest you convert your script to a function, process(), and eliminated the extra variables. Then write a loop to extract the data from your samplesheet and call the function:

#!/bin/bash

# Either set IPDID_Folder and ANALYSIS_OUTPUT_DIR here,
# or check that they were exported to your shell.

process() {
    local dna=$1
    local rna=$2
    local purity=$3

    singularity exec \
        --no-home \
        -B /data:/data \
        -W /data \
        docker.sif \
        bash process.sh \
        --output_directory "${IPDID_Folder}" \
        --reference_fasta_file genome.fa \
        --dna_tumor_id "${dna}" \
        --dna_tumor_pair_id "${dna}" \
        --dna_tumor_output_id "${dna}" \
        --dna_tumor_purity "${purity}" \
        --rna_tumor_id "${rna}" \
        --rna_tumor_pair_id "${rna}" \
        --rna_tumor_output_id "${rna}" \
        --rna_tumor_localapp_run_directory "${ANALYSIS_OUTPUT_DIR}"  
}

tail +2 samplesheet | while read -r dna rna purity
do
    process "$dna" "$rna" "$purity"
done
Allan Wind
  • 23,068
  • 5
  • 28
  • 38
  • Quoting the arguments to `process` is pointless when you don't quote anything inside the function. See [When to wrap quotes around a shell variable](https://stackoverflow.com/questions/10067266/when-to-wrap-quotes-around-a-shell-variable) Also, for robustness, probably prefer `read -r`, though it probably doesn't matter in this example. – tripleee Apr 21 '23 at 08:07
  • @tripleee You're right, of course, and quoted variables as suggested. Funny, I never had a need for a `-r` option to `read`. – Allan Wind Apr 21 '23 at 18:04
  • 1
    The `-r` option is a POSIX addition which provides a workaround for the frankly insane behavior in the treatment of backslashes in the input of the original Bourne shell `read` implementation. By making it an option, they could introduce a change in the behavior without breaking existing scripts; but it should arguably be the default unless you specifically require the legacy semantics. – tripleee Apr 22 '23 at 10:39