2

This question is a follow-up of another one asked some time ago.

I currently have this script:

download_data(){
    wget --load-cookies ~/.urs_cookies --save-cookies ~/.urs_cookies --auth-no-challenge=on --keep-session-cookies --content-disposition $1
}

export -f download_data
DIR=$(dirname "$1")
<$1 xargs -d $'\n' -P 5 -n 1 -- bash -c 'for arg; do download_data $arg; done' _

In other words, I have a text file with a lot of URLs, one per line, and I feed each one of the URLs to wget to download the data.

What I want to do is to add another parameter to download_data(), in order to select the download location of the file. Something like:

download_data(){
    wget -P $1 --load-cookies ~/.urs_cookies --save-cookies ~/.urs_cookies --auth-no-challenge=on --keep-session-cookies --content-disposition $2
}

export -f download_data
DIR=$(dirname "$1")
<$1 xargs -d $'\n' -P 5 -n 1 -- bash -c 'for arg; do download_data $DIR $arg; done' _

Which, in theory, would save the files in the location of my text file. But it does not work: the first argument passed into download_data() is always empty.

I'm quite noob in bash and all this, so it is probably something simple missing...

Thank you for your help!

Miguel
  • 109
  • 6
  • I think that you want `$DIR` to be substituted before it is passed to `bash`, but need `$arg` to remain the same. Try `bash -c "for arg; do download_data $DIR \$arg; done"` – Javier Elices Dec 18 '17 at 21:27
  • Dropping the double quotes from the answer to your previous question by Charles Duffy is decidedly a poor decision. Even if you know the data doesn't require quoting, *we* don't know that, and the world already has way too many shell script snippets with broken quoting. – tripleee Dec 19 '17 at 12:48

2 Answers2

1

If you can live with using GNU Parallel instead of xargs:

download_data(){
  wget -P $1 --load-cookies ~/.urs_cookies --save-cookies ~/.urs_cookies --auth-no-challenge=on --keep-session-cookies --content-disposition $2
}
export -f download_data
DIR=$(dirname "$1")
parallel -a $1 -P5 download_data $DIR {}
Ole Tange
  • 31,768
  • 5
  • 86
  • 104
  • Thank you for your answer. I prefer the xargs approach becaues, if I'm not mistaken, it comes with the cygwin installation, while parallel involves downloading it afterwards. And since this script will be used by other people, it is one less step needed for anyone to use it. But thank you, nonetheless. – Miguel Dec 19 '17 at 17:04
  • Note that GNU Parallel is in essence a single Perl script, and if you are going to distribute your script, GPL allows you to distribute GNU Parallel next to it: http://oletange.blogspot.dk/2013/04/why-not-install-gnu-parallel.html – Ole Tange Dec 19 '17 at 23:47
1

The significance of export is to make a variable visible in subshells.

You already export -f your function; similarly, export your DIR variable as well.

However, you shouldn't be using uppercase for your private variables. And you broke the quoting. So,

download_data(){
    # add missing double quotes
    wget -P "$1" --load-cookies ~/.urs_cookies --save-cookies ~/.urs_cookies --auth-no-challenge=on --keep-session-cookies --content-disposition "$2"
}

export -f download_data
# lowercase variable name
dir=$(dirname "$1")
# ... and export it
export dir
# ... and fix quoting some more
<$1 xargs -d $'\n' -P 5 -n 1 -- bash -c 'for arg; do
    download_data "$dir" "$arg"; done' _

You may wonder about that _ at the end of the xargs command line, though. Obscurely, or elegantly, we could use that to smuggle in the value just as well. It will be used to populate $0 in the script inside the single quotes. Then, we don't need to put it in a named variable, or export that variable.

<$1 xargs -d $'\n' -P 5 -n 1 -- bash -c 'for arg; do
    download_data "$0" "$arg"; done' "$(dirname "$1")"
tripleee
  • 175,061
  • 34
  • 275
  • 318
  • Thank you very much, it was exactly the export that was missing! Since the dirname doesn't change, I access $dir directly in the function. It works like a charm. Quick question: uppercasing the private variables is just a matter of convention or does it have a practical effect in Bash? Thank you very much for your help. – Miguel Dec 19 '17 at 17:01
  • Uppercase is reserved for system variables by POSIX. https://stackoverflow.com/questions/673055/correct-bash-and-shell-script-variable-capitalization – tripleee Dec 19 '17 at 18:49