2

I am working on a Red Hat Linux server. My end goal is to run CRB-BLAST on multiple fasta files and have the results from those in separate directories.

My approach is to download the fasta files using wget then run the CRB-BLAST. I have multiple files and would like to be able to download them each to their own directory (the name perhaps should come from the URL list files), then run the CRB-BLAST.

Example URLs:

http://assemblies/Genomes/final_assemblies/10x_assemblies_v0.1/TC_3370_chr.v0.1.liftover.CDS.fasta.gz
http://assemblies/Genomes/final_assemblies/10x_assemblies_v0.1/TC_CB_chr.v0.1.liftover.CDS.fasta.gz
http://assemblies/Genomes/final_assemblies/10x_assemblies_v0.1/TC_13_chr.v0.1.liftover.CDS.fasta.gz
http://assemblies/Genomes/final_assemblies/10x_assemblies_v0.1/TC_37_chr.v0.1.liftover.CDS.fasta.gz
http://assemblies/Genomes/final_assemblies/10x_assemblies_v0.1/TC_123_chr.v0.1.liftover.CDS.fasta.gz
http://assemblies/Genomes/final_assemblies/10x_assemblies_v0.1/TC_195_chr.v0.1.liftover.CDS.fasta.gz
http://assemblies/Genomes/final_assemblies/10x_assemblies_v0.1/TC_31_chr.v0.1.liftover.CDS.fasta.gz

Ideally, the file name determines the directory name, for example, TC_3370/.

I think there might be a solution with cat URL.txt | mkdir | cd | wget | crb-blast

Currently I just run the commands in line:

mkdir TC_3370

cd TC_3370/

wget url 
http://assemblies/Genomes/final_assemblies/10x_meta_assemblies_v1.0/TC_3370_chr.v1.0.maker.CDS.fasta.gz

crb-blast -q TC_3370_chr.v1.0.maker.CDS.fasta.gz -t TCV2_annot_cds.fna -e 1e-20 -h 4 -o rbbh_TC
Benjamin W.
  • 46,058
  • 19
  • 106
  • 116
Panchito
  • 337
  • 1
  • 3
  • 12

3 Answers3

1

Try this Shellcheck-clean program:

#! /bin/bash -p

while read -r url; do
    file=${url##*/}
    dir=${file%%_chr.*}
    mkdir -v -- "$dir"
    (
        cd "./$dir" || exit 1
        wget -- "$url"
        crb-blast -q "$file" -t TCV2_annot_cds.fna -e 1e-20 -h 4 -o rbbh_TC
    )
done <URL.txt
pjh
  • 6,388
  • 2
  • 16
  • 17
  • How do I define url as a variable with multiple entries beforehand? I usually use R and would have this as a vector or a list. – Panchito May 31 '22 at 15:27
  • @Panchito, the corresponding data structure in Bash is the [array](https://www.gnu.org/software/bash/manual/bash.html#Arrays). See [Loop through an array of strings in Bash?](https://stackoverflow.com/q/8880603/4154375) for examples of defining and looping over arrays. – pjh May 31 '22 at 16:49
  • @Panchito, if you use arrays in your code (and even if you don't use arrays) make sure to use [Shellcheck](https://www.shellcheck.net/) to check it for problems. [Shellcheck](https://www.shellcheck.net/) is excellent at finding common problems in Bash (and other shell) code (and problems with using arrays in Bash are very common indeed). – pjh May 31 '22 at 16:53
  • @Panchito, see [Creating an array from a text file in Bash](https://stackoverflow.com/q/30988586/4154375) if you want to populate an array with a list of URLs stored in a file. – pjh May 31 '22 at 16:56
0

Another implementation

#!/bin/sh

# Read lines as url as long as it can
while read -r url
do
  # Get file name by stripping-out anything up to the last / from the url
  file_name=${url##*/}

  # Get the destination dir name by stripping anything from the first __chr
  dest_dir=${file_name%%_chr*}

  # Compose the wget output path
  fasta_path="$dest_dir/$file_name"

  if
    # Successfully created the destination directory AND
    mkdir -p -- "$dest_dir" &&
    # Successfully downloaded the file
    wget --output-file="$fasta_path" --quiet -- "$url" 
  then
    # Process the fasta file into fna
    fna_path="$dest_dir/TCV2_annot_cds.fna"
    crb-blast -q "$fasta_path" -t "$fna_path" -e 1e-20 -h 4 -o rbbh_TC
  else
    # Cleanup remove destination directory if any of mkdir or wget failed
    rm -fr -- "$dest_dir"
  fi
  # reading from the URL.txt file for the whole while loop
done < URL.txt
Léa Gris
  • 17,497
  • 4
  • 32
  • 41
0

Download files from list is task for -i file option, if you have file named say urls.txt with one URL per line you might simply do

wget -i urls.txt

Note that this will put all files inside current working directory, so if you wish to have them in separate dirs, you would need to move them after wget finish.

Daweo
  • 31,313
  • 3
  • 12
  • 25