27

I have a bash script that is being used in a CGI. The CGI sets the $QUERY_STRING environment variable by reading everything after the ? in the URL. For example, http://example.com?a=123&b=456&c=ok sets QUERY_STRING=a=123&b=456&c=ok.

Somewhere I found the following ugliness:

b=$(echo "$QUERY_STRING" | sed -n 's/^.*b=\([^&]*\).*$/\1/p' | sed "s/%20/ /g")

which will set $b to whatever was found in $QUERY_STRING for b. However, my script has grown to have over ten input parameters. Is there an easier way to automatically convert the parameters in $QUERY_STRING into environment variables usable by bash?

Maybe I'll just use a for loop of some sort, but it'd be even better if the script was smart enough to automatically detect each parameter and maybe build an array that looks something like this:

${parm[a]}=123
${parm[b]}=456
${parm[c]}=ok

How could I write code to do that?

peterh
  • 11,875
  • 18
  • 85
  • 108
User1
  • 39,458
  • 69
  • 187
  • 265
  • I just noticed that I'm actually stuck on Bash 3. Does anyone have a simple, secure solution that will not involve associative arrays? – User1 Oct 13 '10 at 16:30
  • 1
    See my edited answer for an alternative to associative arrays (also be sure to read the page I linked to ( [BashFAQ/006](http://mywiki.wooledge.org/BashFAQ/006) ). – Dennis Williamson Oct 19 '10 at 01:32
  • this link will help you to solve your issue easily http://stackoverflow.com/questions/17021640/how-to-extract-the-data-using-sed-command – amar Jun 11 '13 at 05:23

16 Answers16

55

Try this:

saveIFS=$IFS
IFS='=&'
parm=($QUERY_STRING)
IFS=$saveIFS

Now you have this:

parm[0]=a
parm[1]=123
parm[2]=b
parm[3]=456
parm[4]=c
parm[5]=ok

In Bash 4, which has associative arrays, you can do this (using the array created above):

declare -A array
for ((i=0; i<${#parm[@]}; i+=2))
do
    array[${parm[i]}]=${parm[i+1]}
done

which will give you this:

array[a]=123
array[b]=456
array[c]=ok

Edit:

To use indirection in Bash 2 and later (using the parm array created above):

for ((i=0; i<${#parm[@]}; i+=2))
do
    declare var_${parm[i]}=${parm[i+1]}
done

Then you will have:

var_a=123
var_b=456
var_c=ok

You can access these directly:

echo $var_a

or indirectly:

for p in a b c
do
    name="var$p"
    echo ${!name}
done

If possible, it's better to avoid indirection since it can make code messy and be a source of bugs.

Dennis Williamson
  • 346,391
  • 90
  • 374
  • 439
  • 2
    +1 for the `parm` array generation. But all methods presented to loop that array fail to properly handle repeated keys. Each occurrence will overwrite the previous. For example, a=1&a=2&a=x would result in parm[a]=x – MestreLion Aug 09 '11 at 20:44
  • 1
    @MestreLion: You can add logic to deal with the possibility of repeated keys, but you would need to decide how you want to deal with them. You could do first-precedent or last-precedent or some method of accumulation. – Dennis Williamson Aug 11 '11 at 22:14
  • 4
    `parm=($QUERY_STRING)` subjects the words resulting from the expansion of `$QUERY` to globbing, which is probably undesired. A more robust alternative that also saves you the trouble of saving and restoring `$IFS`: `IFS='&=' read -ra parm <<<"$QUERY_STRING"` It's better not to use all-uppercase shell-variable names in order to [avoid conflicts with environment variables and special shell variables](http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html#tag_08_01). – mklement0 Aug 03 '16 at 03:36
  • @dmnc: You can set a variable on the same line as a command (e.g. `read`) to have that variable value active only in the environment of the command. The same is not true when doing two assignments on the same line. In that case, the value persists. So in the example in my answer it is necessary to save and restore `IFS`. There are other ways to avoid saving and restoring, such as using a subshell. So I rolled back your edit. – Dennis Williamson Dec 13 '16 at 16:38
17

you can break $QUERY down using IFS. For example, setting it to &

$ QUERY="a=123&b=456&c=ok"
$ echo $QUERY
a=123&b=456&c=ok
$ IFS="&"
$ set -- $QUERY
$ echo $1
a=123
$ echo $2
b=456
$ echo $3
c=ok

$ array=($@)

$ for i in "${array[@]}"; do IFS="=" ; set -- $i; echo $1 $2; done
a 123
b 456
c ok

And you can save to a hash/dictionary in Bash 4+

$ declare -A hash
$ for i in "${array[@]}"; do IFS="=" ; set -- $i; hash[$1]=$2; done
$ echo ${hash["b"]}
456
ghostdog74
  • 327,991
  • 56
  • 259
  • 343
  • Except where you rely on word-splitting, please double-quote your variable references. Note that `set -- $QUERY` makes the words in `$QUERY` subject to globbing, which is probably undesired. It's better not to use all-uppercase shell-variable names in order to [avoid conflicts with environment variables and special shell variables](http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html#tag_08_01). – mklement0 Aug 03 '16 at 03:31
  • @mklement0: This relies on word-splitting, in particular on splitting at `&`. Globbing isn't an issue as the query string is url-encoded. – MSalters Jun 29 '17 at 11:37
  • @MSalters: Thanks; I hadn't considered the _constrained contents of the strings_. The same applies to unquoted use of `$1`, `$2`, and `$3`, which happen to be fine _in this particular case_. I was only referring to _these_ variable references with "please double-quote", noting "except where you rely on word-splitting" (`set -- $QUERY` and `set -- $i`). To promote good habits, (a) `$1`, `$2`, and `$3` should still be double-quoted, and (b) the _general_ pitfall of unintended globbing is worth pointing out _directly in the answer_. – mklement0 Jun 29 '17 at 12:39
7

Please don't use the evil eval junk.

Here's how you can reliably parse the string and get an associative array:

declare -A param   
while IFS='=' read -r -d '&' key value && [[ -n "$key" ]]; do
    param["$key"]=$value
done <<<"${QUERY_STRING}&"

If you don't like the key check, you could do this instead:

declare -A param   
while IFS='=' read -r -d '&' key value; do
    param["$key"]=$value
done <<<"${QUERY_STRING:+"${QUERY_STRING}&"}"

Listing all the keys and values from the array:

for key in "${!param[@]}"; do
    echo "$key: ${param[$key]}"
done
bolt
  • 169
  • 1
  • 5
3

I packaged the sed command up into another script:

$cat getvar.sh

s='s/^.*'${1}'=\([^&]*\).*$/\1/p'
echo $QUERY_STRING | sed -n $s | sed "s/%20/ /g"

and I call it from my main cgi as:

id=`./getvar.sh id`
ds=`./getvar.sh ds`
dt=`./getvar.sh dt`

...etc, etc - you get idea.

works for me even with a very basic busybox appliance (my PVR in this case).

Simon
  • 39
  • 1
3

To converts the contents of QUERY_STRING into bash variables use the following command:

eval $(echo ${QUERY_STRING//&/;})

The inner step, echo ${QUERY_STRING//&/;}, substitutes all ampersands with semicolons producing a=123;b=456;c=ok which the eval then evaluates into the current shell.

The result can then be used as bash variables.

echo $a
echo $b
echo $c

The assumptions are:

  • values will never contain '&'
  • values will never contain ';'
  • QUERY_STRING will never contain malicious code
Tai Paul
  • 900
  • 10
  • 19
  • 1
    Outch! Security Alert! Never evaluate anything from the network. – ceving Apr 16 '19 at 09:09
  • @ceving ...except if it is your test/mock system, or the http requests are coming from your own program. – peterh Sep 19 '20 at 07:48
  • Most trivial crack: https://your.host/your.cgi?rm%20-rf%20%7e <-- this will let your webserver to execute an `rm -rf /` :-) – peterh Sep 19 '20 at 07:49
  • I voted the post up because it is genially simple. Btw, I think a simple `eval "${QUERY_STRING//&/;}"` would be enough, you don't need to echo the variable and then substitute its output into the args of eval. – peterh Sep 19 '20 at 07:51
3

While the accepted answer is probably the most beautiful one, there might be cases where security is super-important, and it needs to be also well-visible from your script.

In such a case, first I wouldn't use bash for the task, but if it should be done on some reason, it might be better to avoid these new array - dictionary features, because you can't be sure, how exactly are they escaped.

In this case, the good old primitive solutions might work:

QS="${QUERY_STRING}"
while [ "${QS}" != "" ]
do
  nameval="${QS%%&*}"
  QS="${QS#$nameval}"
  QS="${QS#&}"
  name="${nameval%%=*}"
  val="${nameval#$name}"
  val="${nameval#=}"

  # and here we have $name and $val as names and values

  # ...

done

This iterates on the name-value pairs of the QUERY_STRING, and there is no way to circumvent it with any tricky escape sequence - the " is a very strong thing in bash, except a single variable name substitution, which is fully controlled by us, nothing can be tricked.

Furthermore, you can inject your own processing code into "# ...". This enables you to allow only your own, well-defined (and, ideally, short) list of the allowed variable names. Needless to say, LD_PRELOAD shouldn't be one of them. ;-)

Furthermore, no variable will be exported, and exclusively QS, nameval, name and val is used.

peterh
  • 11,875
  • 18
  • 85
  • 108
2

Following the correct answer, I've done myself some changes to support array variables like in this other question. I added also a decode function of which I can not find the author to give some credit.

Code appears somewhat messy, but it works. Changes and other recommendations would be greatly appreciated.

function cgi_decodevar() {
    [ $# -ne 1 ] && return
    local v t h
    # replace all + with whitespace and append %%
    t="${1//+/ }%%"
    while [ ${#t} -gt 0 -a "${t}" != "%" ]; do
        v="${v}${t%%\%*}" # digest up to the first %
        t="${t#*%}"       # remove digested part
        # decode if there is anything to decode and if not at end of string
        if [ ${#t} -gt 0 -a "${t}" != "%" ]; then
            h=${t:0:2} # save first two chars
            t="${t:2}" # remove these
            v="${v}"`echo -e \\\\x${h}` # convert hex to special char
        fi
    done
    # return decoded string
    echo "${v}"
    return
}

saveIFS=$IFS
IFS='=&'
VARS=($QUERY_STRING)
IFS=$saveIFS

for ((i=0; i<${#VARS[@]}; i+=2))
do
  curr="$(cgi_decodevar ${VARS[i]})"
  next="$(cgi_decodevar ${VARS[i+2]})"
  prev="$(cgi_decodevar ${VARS[i-2]})"
  value="$(cgi_decodevar ${VARS[i+1]})"

  array=${curr%"[]"}

  if  [ "$curr" == "$next" ] && [ "$curr" != "$prev" ] ;then
      j=0
      declare var_${array}[$j]="$value"
  elif [ $i -gt 1 ] && [ "$curr" == "$prev" ]; then
    j=$((j + 1))
    declare var_${array}[$j]="$value"
  else
    declare var_$curr="$value"
  fi
done
Community
  • 1
  • 1
badc0de
  • 167
  • 3
2

I would simply replace the & to ;. It will become to something like:

a=123;b=456;c=ok

So now you need just evaluate and read your vars:

eval `echo "${QUERY_STRING}"|tr '&' ';'`
echo $a
echo $b
echo $c
Petras L
  • 183
  • 1
  • 1
  • 7
  • This is not only a security risk, but also fragile, given that values could themselves contain `;` or start with `~`. – mklement0 Aug 03 '16 at 03:40
1

A nice way to handle CGI query strings is to use Haserl which acts as a wrapper around your Bash cgi script, and offers convenient and secure query string parsing.

0

To bring this up to date, if you have a recent Bash version then you can achieve this with regular expressions:

q="$QUERY_STRING"
re1='^(\w+=\w+)&?'
re2='^(\w+)=(\w+)$'
declare -A params
while [[ $q =~ $re1 ]]; do
  q=${q##*${BASH_REMATCH[0]}}       
  [[ ${BASH_REMATCH[1]} =~ $re2 ]] && params+=([${BASH_REMATCH[1]}]=${BASH_REMATCH[2]})
done

If you don't want to use associative arrays then just change the penultimate line to do what you want. For each iteration of the loop the parameter is in ${BASH_REMATCH[1]} and its value is in ${BASH_REMATCH[2]}.

Here is the same thing as a function in a short test script that iterates over the array outputs the query string's parameters and their values

#!/bin/bash
QUERY_STRING='foo=hello&bar=there&baz=freddy'

get_query_string() {
  local q="$QUERY_STRING"
  local re1='^(\w+=\w+)&?'
  local re2='^(\w+)=(\w+)$'
  while [[ $q =~ $re1 ]]; do
    q=${q##*${BASH_REMATCH[0]}}
    [[ ${BASH_REMATCH[1]} =~ $re2 ]] && eval "$1+=([${BASH_REMATCH[1]}]=${BASH_REMATCH[2]})"
  done
}

declare -A params
get_query_string params

for k in "${!params[@]}"
do
  v="${params[$k]}"
  echo "$k : $v"
done          

Note the parameters end up in the array in reverse order (it's associative so that shouldn't matter).

starfry
  • 9,273
  • 7
  • 66
  • 96
  • @starfy thanks for this but it does not work with a few characters that are admissable in parameter values, e.g. the simple hyphen "-". When parsing such parameters - e.g. p=foo-bar - only the first part of the value is returned (foo). – giacecco Feb 19 '17 at 11:40
0

why not this

    $ echo "${QUERY_STRING}"
    name=carlo&last=lanza&city=pfungen-CH
    $ saveIFS=$IFS
    $ IFS='&'
    $ eval $QUERY_STRING
    $ IFS=$saveIFS

now you have this

    name = carlo
    last = lanza
    city = pfungen-CH

    $ echo "name is ${name}"
    name is carlo
    $ echo "last is ${last}"
    last is lanza
    $ echo "city is ${city}"
    city is pfungen-CH
0

@giacecco

To include a hiphen in the regex you could change the two lines as such in answer from @starfry.

Change these two lines:

  local re1='^(\w+=\w+)&?'
  local re2='^(\w+)=(\w+)$'

To these two lines:

  local re1='^(\w+=(\w+|-|)+)&?'
  local re2='^(\w+)=((\w+|-|)+)$'
L. Nozot
  • 1
  • 1
0

For all those who couldn't get it working with the posted answers (like me), this guy figured it out.

Can't upvote his post unfortunately...

Let me repost the code here real quick:

#!/bin/sh

if [ "$REQUEST_METHOD" = "POST" ]; then
  if [ "$CONTENT_LENGTH" -gt 0 ]; then
      read -n $CONTENT_LENGTH POST_DATA <&0
  fi
fi

#echo "$POST_DATA" > data.bin
IFS='=&'
set -- $POST_DATA

#2- Value1
#4- Value2
#6- Value3
#8- Value4

echo $2 $4 $6 $8

echo "Content-type: text/html"
echo ""
echo "<html><head><title>Saved</title></head><body>"
echo "Data received: $POST_DATA"
echo "</body></html>"

Hope this is of help for anybody.

Cheers

kindaleek
  • 81
  • 1
  • 6
0

Actually I liked bolt's answer, so I made a version which works with Busybox as well (ash in Busybox does not support here string). This code will accept key1 and key2 parameters, all others will be ignored.

while IFS= read -r -d '&' KEYVAL && [[ -n "$KEYVAL" ]]; do
case ${KEYVAL%=*} in
        key1) KEY1=${KEYVAL#*=} ;;
        key2) KEY2=${KEYVAL#*=} ;;
    esac
done <<END
$(echo "${QUERY_STRING}&")
END
ZsZs
  • 11
  • 1
0

One can use the bash-cgi.sh, which processes :

  • the query string into the $QUERY_STRING_GET key and value array;

  • the post request data (x-www-form-urlencoded) into the $QUERY_STRING_POST key and value array;

  • the cookies data into the $HTTP_COOKIES key and value array.

Demands bash version 4.0 or higher (to define the key and value arrays above).

All processing is made by bash only (i.e. in an one process) without any external dependencies and additional processes invoking.

It has:

  • the check for max length of data, which can be transferred to it's input, as well as processed as query string and cookies;

  • the redirect() procedure to produce redirect to itself with the extension changed to .html (it is useful for an one page's sites);

  • the http_header_tail() procedure to output the last two strings of the HTTP(S) respond's header;

  • the $REMOTE_ADDR value sanitizer from possible injections;

  • the parser and evaluator of the escaped UTF-8 symbols embedded into the values passed to the $QUERY_STRING_GET, $QUERY_STRING_POST and $HTTP_COOKIES;

  • the sanitizer of the $QUERY_STRING_GET, $QUERY_STRING_POST and $HTTP_COOKIES values against possible SQL injections (the escaping like the mysql_real_escape_string php function does, plus the escaping of @ and $).

It is available here:

https://github.com/VladimirBelousov/fancy_scripts

belousov
  • 91
  • 2
  • 2
0

This works in dash using for in loop

IFS='&'
for f in $query_string; do
   value=${f##*=}
   key=${f%%=*}
    # if you need environment variable -> eval "qs_$key=$value"
done
Henning
  • 156
  • 2
  • 5