0

I have a bash script in my apache directories that download some pictures and optimize them.

my script path is in : /var/www/site/storage/optimazer/photo_optimazer.sh

this script get some command from an txt file and pass it to wget

#!/usr/bin/env bash
..
THREAD="$(cat ${THREAD_FILE})";
$(command -v wget) $THREAD
...

Contents of ${THREAD_FILE}:

$ cat "${THREAD_FILE}"
--user-agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:90.0) Gecko/20100101 Firefox/90.0" -np -r -l 1 -A "jpg" --ignore-case -P /var/www/optimazer/public/optimazed -x http://example.com

I try to execute this bash with another one that was created at /usr/local/bin/optimaze.sh

I had to do it cuz its would be run with system services.

here is the /usr/local/bin/optimaze.sh content

#!/usr/bin/env bash

cd /var/www/site/storage/optimazer/
$(command -v bash) photo_optimazer.sh

now, when I execute the optimaze.sh its add some extra quotes to my ${THREAD} content and broke the script and I got some errors like this :

--2021-07-30 12:56:59--  http://(windows/
Resolving (windows ((windows)... failed: Name or service not known.
wget: unable to resolve host address ‘(windows’
--2021-07-30 12:56:59--  http://nt/
Resolving nt (nt)... failed: Name or service not known.
wget: unable to resolve host address ‘nt’
--2021-07-30 12:56:59--  http://10.0;/
Resolving 10.0; (10.0;)... failed: Name or service not known.
wget: unable to resolve host address ‘10.0;’
--2021-07-30 12:56:59--  http://win64;/
Resolving win64; (win64;)... failed: Name or service not known.
wget: unable to resolve host address ‘win64;’
--2021-07-30 12:56:59--  http://x64;/
Resolving x64; (x64;)... failed: Name or service not known.
wget: unable to resolve host address ‘x64;’
--2021-07-30 12:56:59--  ftp://rv/90.0)
           => ‘/var/www/scraper/public/***/3/rv/.listing’
Resolving rv (rv)... failed: Name or service not known.
wget: unable to resolve host address ‘rv’
--2021-07-30 12:56:59--  http://gecko/20100101
Resolving gecko (gecko)... failed: Name or service not known.
wget: unable to resolve host address ‘gecko’
--2021-07-30 12:56:59--  http://firefox/90.0%22
Resolving firefox (firefox)... failed: Name or service not known.
wget: unable to resolve host address ‘firefox’

I try set -ex in photo_optimazer.sh and see what happend

 wget '--user-agent="Mozilla/5.0' '(Windows' NT '10.0;' 'Win64;' 'x64;' 'rv:90.0)' Gecko/20100101 'Firefox/90.0"' -np -A '"jpg,png"' --ignore-case --ignore-length -P /example/path -x http://example.com

It add single quotes to my ${THREAD} output and I don't know why!

I use GNU bash, version 4.2.46(2)-release (x86_64-redhat-linux-gnu)

markp-fuso
  • 28,790
  • 4
  • 16
  • 36
Mosi
  • 333
  • 1
  • 4
  • 15
  • 3
    This is very much the issue described in [BashFAQ #50](http://mywiki.wooledge.org/BashFAQ/050). – Charles Duffy Jul 30 '21 at 17:42
  • 2
    BTW, why in the world would you use `$(command -v wget)` instead of just `wget`? It's slower (all command substitutions require forking off a subprocess), and offers no benefit (in fact, it _stops_ the shell from being able to make a note of and later reuse `wget`'s location, because it makes that lookup happen in a subshell, so the shell's in-memory state is all lost as soon as the subshell exits). – Charles Duffy Jul 30 '21 at 17:43
  • BTW, `set -e` is not necessarily a good idea. I'd strongly suggest going through the exercises in [BashFAQ #105](http://mywiki.wooledge.org/BashFAQ/105#Exercises), if not also reading the parable above them. – Charles Duffy Jul 30 '21 at 17:54
  • 1
    The answers to [this](https://stackoverflow.com/questions/26067249/reading-quoted-escaped-arguments-correctly-from-a-string) or [this](https://superuser.com/questions/1529226/get-bash-to-respect-quotes-when-word-splitting-subshell-output) might help. (But if possible, using an easier-to-parse format would be better.) (And avoid anything involving `eval` -- that way lies madness and weird bugs.) – Gordon Davisson Jul 31 '21 at 07:54
  • Right -- if the OP won't accept an answer describing a better storage format for their argument list, then their question comes down to parsing those quotes literally, which would make the question a duplicate of the above. – Charles Duffy Jul 31 '21 at 15:47

2 Answers2

2

For this particular case one idea would be to feed each line to xargs.

For sample data I doubled OP's $THREAD_FILE:

$ cat tfile
--user-agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:90.0) Gecko/20100101 Firefox/90.0" -np -r -l 1 -A "jpg" --ignore-case -P /var/www/optimazer/public/optimazed -x http://example.com
--user-agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:90.0) Gecko/20100101 Firefox/90.0" -np -r -l 1 -A "jpg" --ignore-case -P /var/www/optimazer/public/optimazed -x http://example.com

A first pass at xargs:

cat tfile | xargs -r wget

Or we can eliminate the unnecessary cat by feeding the file directly to xargs:

xargs -r -a tfile wget

A few variations on KamilCuk's comment/suggestion:

xargs -r < tfile wget
xargs -r wget < tfile
< tfile xargs -r wget

If we're dealing wih a variable (as with OPs example):

thread=$(head -1 tfile)
xargs -r wget <<< "${thread}"

And expanding on the <<< "${thread}" example ... using this in a loop (eg, need to perform additional processing for each line from a multi-line input file):

while read -r thread
do
    xargs -r wget <<< "${thread}"
done < tfile

All of these generate the following for each line processed:

--2021-07-31 13:50:41--  http://example.com/
Resolving example.com (example.com)... 93.184.216.34, 2606:2800:220:1:248:1893:25c8:1946
Connecting to example.com (example.com)|93.184.216.34|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1256 (1.2K) [text/html]
Saving to: ‘/var/www/optimazer/public/optimazed/example.com/index.html.tmp’

example.com/index.html.tmp               100%[================================================================================>]   1.23K  --.-KB/s    in 0.001s

2021-07-31 13:50:41 (1.25 MB/s) - ‘/var/www/optimazer/public/optimazed/example.com/index.html.tmp’ saved [1256/1256]

Removing /var/www/optimazer/public/optimazed/example.com/index.html.tmp since it should be rejected.
markp-fuso
  • 28,790
  • 4
  • 16
  • 36
  • `xargs -r -a tfile wget` just `xargs -r < tfile wget`. The `-a` option is for cases where you can't spawn shell to do `<` redirection for you. – KamilCuk Jul 31 '21 at 19:10
  • hmmm, haven't seen that one before and while I understand it ... the `< tfile`, in what looks like the middle of the command, sets off my `WTF?` alarm :-) I'll add it to the answer, thanks – markp-fuso Jul 31 '21 at 19:14
  • You can put it anywhere, even on the start. `< file cmd arg` is the same as `cmd < file arg` same as `cmd arg < file`. I like it as a cat replacement, i.e. instead of `cat file | cmd` do `< file cmd` – KamilCuk Jul 31 '21 at 19:17
  • fair enough ... I just confirmed `xargs -r wget < tfile` also works – markp-fuso Jul 31 '21 at 19:19
  • This technique is already taught in https://stackoverflow.com/questions/26067249/reading-quoted-escaped-arguments-correctly-from-a-string -- is there a reason to add a new answer vs closing the question as a duplicate? – Charles Duffy Aug 02 '21 at 14:02
1

If your arguments can't contain newlines, consider changing THREAD_FILE (better named all-lowercase, as thread_file, to stay out of the reserved all-caps namespace) to be contain one argument per line, with no shell syntax whatsoever:

--user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:90.0) Gecko/20100101 Firefox/90.0
-np
-r
-l
1
-A
jpg
--ignore-case
-P
/var/www/optimazer/public/optimazed
-x
http://example.com

Once you've done that, you can use (in bash 4.0 or later) readarray or mapfile to read each line of that file into a new array entry:

readarray -t wget_args <"$thread_file"

...and then expand that array onto your wget command line:

wget "${wget_args[@]}"

A note, about that "reserved all-caps namespace" claim made above: The POSIX standard only strictly requires POSIX-specified tools to use only all-caps names for environment variables that modify those tools' behavior. However:

  • When an environment variable and a shell variable have the same name, any changes to the shell variable will also implicitly modify the environment variable.
  • The purpose of POSIX tools being defined to use only all-caps variables is to make variable names with at least one lower-case variable safe for application use.

When all-caps variables are used for application-defined purposes, doing so discards the benefits of the restrictions POSIX places on built-in tools.

Charles Duffy
  • 280,126
  • 43
  • 390
  • 441