9

I am trying to download files from a database using wget and url. E.g.

wget "http://www.rcsb.org/pdb/files/1BXS.pdb"

So format of the url is as such: http://www.rcsb.org/pdb/files/($idnumber).pdb"

But I have many files to download; so I wrote a bash script that reads id_numbers from a text file, forms url string and downloads by wget.

!/bin/bash

while read line
do
url="http://www.rcsb.org/pdb/files/$line.pdb"
echo -e $url
wget $url
done < id_numbers.txt

However, url string is formed as

.pdb://www.rcsb.org/pdb/files/4H80

So, .pdb is repleced with http. I cannot figure out why. Does anyone have an idea? How can I format it so url is

"http://www.rcsb.org/pdb/files/($idnumber).pdb"

? Thanks a lot.

Note. This question was marked as duplicate of 'How to concatenate strings in bash?' but I was actually asking for something else. I read that question before asking this one and it turns out my problem was with preparing the txt file in Windows not really string concetanation. I edited question title. I hope it is more clear now.

Gofrette
  • 468
  • 1
  • 8
  • 18

3 Answers3

11

It sounds like your id_numbers.txt file has DOS/Windows-style line endings (carriage return followed by linefeed characters) instead of plain unix line endings (just linefeed). The result is that read thinks the line ends with a carriage return, $line actually has a carriage return at the end, and that gets embedded in the url, causing various confusion.

There are several ways to solve this. You could have bash trim the carriage return from the variable when you use it:

url="http://www.rcsb.org/pdb/files/${line%$'\r'}.pdb"

Or you could have read trim it by telling it that carriage return counts as whitespace (read will trim leading and trailing whitespace from what it reads):

while IFS=$'\r' read line

Or you could use a command like dos2unix (or whatever the equivalent is on your OS) to convert the id_numbers.txt file.

Gordon Davisson
  • 118,432
  • 16
  • 123
  • 151
  • This worked! Thank you very much. I have figured it is due to carriage return but I have not figured out to how to get rid of it. Much appreciated! – Gofrette Jan 31 '14 at 09:01
2

The -e echo option is used to output the desired content without inserting a new line, you do not need it here.

Also I suspect your file containing the ids to be malformed, on which OS did you create it?

Anyway, you can simplify your script this way:

!/bin/bash

while read line
do
    wget "http://www.rcsb.org/pdb/files/$line.pdb"
done < id_numbers.txt

I was able to successfully test it with an id_numbers.txt file generated like so:

for i in $(0 9) ; do echo "$i" >> id_numbers.txt ; done
aymericbeaumet
  • 6,853
  • 2
  • 37
  • 50
  • 1
    or id_nums.txt was created on windows and has \r\n line endings. Ah Gordon has picked up on this. `dos2unix file`. Good luck to all. – shellter Jan 30 '14 at 17:56
  • 1
    Yep I suspected this too, second sentence of my answer. – aymericbeaumet Jan 30 '14 at 19:21
  • Hi,You are right. I have formed .txt file in Notepad++ in Windows. That caused the problem. I had `echo` command there to see the url I am forming so I can troubleshoot. Thanks for the response. – Gofrette Jan 31 '14 at 09:05
  • In this case you just have to change the line delimiter (IFS) as said by @Gordon. My code is still valid if you want something clean (just add `IFS=$'\r'` before the while loop). – aymericbeaumet Jan 31 '14 at 17:25
0

Try this:

url="http://www.rcsb.org/pdb/files/"$line
$url=$url".pdb"

For more info, check How to concatenate string variables in Bash?

Community
  • 1
  • 1