1

I am trying to download some fastq files from ENA and would like to do a loop for my commands.

for (( i = 36; i <= 43; i++ ))

do

wget ftp.sra.ebi.ac.uk/vol1/fastq/SRR705/006/SRR70591$i/SRR70591$i_1.fastq.gz ftp.sra.ebi.ac.uk/vol1/fastq/SRR705/006/SRR70591$i/SRR70591$i_2.fastq.gz

done

and the output was this

--2018-08-19 22:37:14--  http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR705/006/SRR7059137/SRR70591.fastq.gz
Resolving ftp.sra.ebi.ac.uk (ftp.sra.ebi.ac.uk)... 193.62.192.7
Connecting to ftp.sra.ebi.ac.uk (ftp.sra.ebi.ac.uk)|193.62.192.7|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2018-08-19 22:37:14 ERROR 404: Not Found.

Notably the error occured because

"37_1" is missing from the output SRR70591.fastq.gz.

I've tried various iterations of the, and found out that the command does not work well if underscore"_" goes directly after "$i" command. Do you guys have any insights into how I can change the command to make it work?

Thanks guys

John Kugelman
  • 349,597
  • 67
  • 533
  • 578
JS Low
  • 33
  • 5
  • 3
    `${i}_1` otherwise it is taking your variable name as `i_1`. (e.g. you need to *brace enclose* `i` to make it explicit that `i` is your variable. `'_'` can be part of a variable name and if you fail to enclose `i` in braces -- `{i}` -- the shell is seeing `i_1` as the variable name.) – David C. Rankin Aug 20 '18 at 02:51
  • Thanks David, I tried that before, but when I do that, the looping stops, it only download i=36, but not i=37, 38, so on and so forth. So im a little unsure how to move forward. – JS Low Aug 20 '18 at 02:55
  • 2
    Check all your `i` occurrences. `/` cannot be part of a variable name so `$i/` is fine, but `$i_2` presents the same problem. So if the following character can legally be part of the variable name, put braces around the variable. (if you are not sure -- then put braces around it -- doesn't hurt) Also to debug, wrap your entire `"wget ..."` in an echo statement, e.g. `echo "wget ..."` and validate your commands are formed as you intend. Then remove the `echo` and turn it loose on the web... – David C. Rankin Aug 20 '18 at 02:57
  • Thanks very much David, I'll give it a shot. – JS Low Aug 20 '18 at 03:38
  • Sure, glad to help. This bites everybody learning to use variables within surrounding text when they begin scripting -- it's usually the way we all learn to use braces to protect the variable name to begin with `:)` – David C. Rankin Aug 20 '18 at 03:40

2 Answers2

2

The problem you are having is due to the variable 'i' being followed by an '_' which can be part of the variable name itself. This causes a failure to substitute for i where both $i_1 and $i_2 occur in your wget command.

While not tagged bash, the following basic shell principals apply. It boils down to a basic understanding of word, name and parameter (or variable) definitions and requirement. For example, a word is defined as:

word    A sequence of characters considered as a single unit by the shell. 
        Also known as a token.

When used as a name, it has the following definition:

name    A word consisting only of alphanumeric characters and underscores, 
        and beginning with an alphabetic character or an underscore. 
        Also referred to as an identifier.

(Note carefully how "an underscore" is included in the definition of a name.)

Finally when the name is used as a variable or parameter, the following applies:

${parameter}
        The value of parameter is substituted. The braces are required 
        when parameter is a positional parameter with more than one digit, 
        or when parameter is followed by a character which is not to be 
        interpreted as part of its name... 

(Note above when "braces are required")

Putting the pieces together, your loop and wget command could be reformed as:

for (( i = 36; i <= 43; i++ ))
do
  wget \
  ftp.sra.ebi.ac.uk/vol1/fastq/SRR705/006/SRR70591$i/SRR70591${i}_1.fastq.gz \
  ftp.sra.ebi.ac.uk/vol1/fastq/SRR705/006/SRR70591$i/SRR70591${i}_2.fastq.gz
done

(note: if you are unsure whether braces are required -- add them -- they do not hurt)

Also as noted in the comment, when you are developing and testing your script, an easy way to verify your commands are formed as you intend is to simply test by echoing your commands as output first, e.g. just wrap your entire command in quotes and echo it, e.g.

for (( i = 36; i <= 43; i++ ))
do
  echo "wget ...your full command..."
done

You can then verify your commands are formed as intended before turning your script loose on the web.

David C. Rankin
  • 81,885
  • 6
  • 58
  • 85
0

Try

    for (( i = 36; i <= 43; i++ ))

do
line1="ftp.sra.ebi.ac.uk/vol1/fastq/SRR705/006/SRR70591$i/SRR70591"
line1+=$i 
line1+="_1.fastq.gz"
line2="ftp.sra.ebi.ac.uk/vol1/fastq/SRR705/006/SRR70591$i/SRR70591"
line2+=$i
line2+="_2.fastq.gz"

wget $line1 $line2

done
  • If you want to showcase good practices, proper quoting is a place to start -- `wget "$line1" "$line2"`. That said, I'm not sure this answer is maximally responsive as a whole -- it's only applicable to assignments, whereas using curly braces resolves the OP's issue in any context. – Charles Duffy Aug 20 '18 at 03:41