7

I know, this was asked a bilion times, but i yet did not find the optimal solution for my specific case.

I'm receving a string like this:

VAR1="some text here" VAR2='some another text' some script --with --some=args

how do i split the string like this: (most preferable in pure bash)

VAR1="some text here"
VAR2='some another text'
some script --with --some=args

set -- $str result in VAR1="some

set -- "$str" returns entire string

eval set -- "$str" result in VAR1=some text here

sure, i could add quotes to the string returned by eval, but i get highly untrusted input so eval is not a option at all.

Important: there can be from zero to unlimited VARs and they can be single or double quoted

Also, the VAR is a fake name here, it can in fact be anything.

Thanks.

James Evans
  • 765
  • 1
  • 7
  • 11

4 Answers4

4

It's not remotely close to pure bash -- but Python has a shlex module which attempts to provide shell-compatible lexing.

>>> import shlex, pprint
>>> pprint.pprint(shlex.split('''VAR1="some text here" VAR2='some another text' some script --with --some=args'''))
['VAR1=some text here',
 'VAR2=some another text',
 'some',
 'script',
 '--with',
 '--some=args']

The following, more complete example uses this Python module from bash, with NUL-delimited stream providing unambiguous transport:

shlex() {
  python -c $'import sys, shlex\nfor arg in shlex.split(sys.stdin):\n\tsys.stdout.write(arg)\n\tsys.stdout.write(\"\\0\")'
}
args=()
while IFS='' read -r -d ''; do
  args+=( "$REPLY" )
done < <(shlex <<<$'VAR1="some text here" VAR2=\'some another text\' some script --with --some=args')
printf '%s\n' "${args[@]}"
Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
  • 2
    I find it ironic that `shlex` is built to behave like bash, but then must be used in bash as a workaround for a missing direct api – jozxyqk May 09 '14 at 06:47
  • @jozxyqk, `shlex` is built to behave like POSIX sh, not bash. Bash has a number of extensions to the standard which shlex doesn't support, such as `$''` quoting (with backslash-escape sequences, but no evaluation) – Charles Duffy May 09 '14 at 17:32
3

Huh, seems i'm late to the party :)

Here is how i'm dealing with environment vars passed before script.

First off all, escape_args function will escape spaces "inside" passed vars,

so if user pass VAR="foo bar", it will become VAR=foo\0040bar.

function escape_args {
  local str=''
  local opt=''
  for c in $1; do
    if [[ "$c" =~ ^[[:alnum:]]+=[\"|\'] ]]; then
      if [[ "${c: -1}" =~ [\"|\']  ]]; then
        str="$str $( echo $c | xargs )"
      else
        # first opt chunk
        # entering collector
        opt="$c"
      fi
    else
      if [ -z "$opt" ]; then
        # not inside collector
        str="$str $c"
      else
        # inside collector
        if [[ "${c: -1}" =~ [\"|\']  ]]; then
          # last opt chunk
          # adding collected chunks and this last one to str
          str="$str $( echo "$opt\0040$c" | xargs )"
          # leaving collector
          opt=''
        else
          # middle opt chunk
          opt="$opt\0040$c"
        fi
      fi
    fi
  done
  echo "$str"
}

Lets test it against a modified version of your input:

s="VAR1=\"some text here\" VAR2='some another text' VAR3=\"noSpaces\" VAR4='noSpacesToo' VAR5=noSpacesNoQuotes some script --with --some=args"

echo $(escape_args "$s")

VAR1=some\0040text\0040here VAR2=some\0040another\0040text VAR3=noSpaces VAR4=noSpacesToo VAR5=noSpacesNoQuotes some script --with --some=args

see, all vars are space-escaped and quotes removed, so declare will work correctly.

Now you can iterate through the parts of your input.

Here is an example how you can declare vars and run the script:

cmd=''
for c in $(escape_args "$s"); do
  [[ "$c" =~ ^[[:alnum:]]+= ]] && declare "$(echo -e $c)" && continue
  cmd="$cmd $c"
done

echo VAR1 is set to $VAR1
echo VAR2 is set to $VAR2
echo VAR3 is set to $VAR3
echo VAR4 is set to $VAR4
echo VAR5 is set to $VAR5
echo $cmd

This iterator is doing two simple things:

  • declaring a var if the chunk matching SOME_VAR= expression
  • adding the chunk to the final cmd otherwise

so the output will be:

VAR1 is set to some text here
VAR2 is set to some another text
VAR3 is set to noSpaces
VAR4 is set to noSpacesToo
VAR5 is set to noSpacesNoQuotes
some script --with --some=args

Is this close to your needs?

2

You can play with the following pure bash code. It goes over the input character by character and tries to keep flags about being inside/outside of quotes.

#! /bin/bash 
string=$(cat <<'EOF'
VAR1="some text here" VAR2='some another text' VAR3="a'b" VAR4='a"b' VAR5="a\"b" VAR6='a'"'"'b' some script --with --some=args
EOF
)
echo "$string"

results=()
result=''
inside=''
for (( i=0 ; i<${#string} ; i++ )) ; do
    char=${string:i:1}
    if [[ $inside ]] ; then
        if [[ $char == \\ ]] ; then
            if [[ $inside=='"' && ${string:i+1:1} == '"' ]] ; then
                let i++
                char=$inside
            fi
        elif [[ $char == $inside ]] ; then
            inside=''
        fi
    else
        if [[ $char == ["'"'"'] ]] ; then
            inside=$char
        elif [[ $char == ' ' ]] ; then
            char=''
            results+=("$result")
            result=''
        fi
    fi
    result+=$char
done
if [[ $inside ]] ; then
    echo Error parsing "$result"
    exit 1
fi

for r in "${results[@]}" ; do
    echo "< $r >"
done
choroba
  • 231,213
  • 25
  • 204
  • 289
  • +1++++ , now that's maniacal bash coding! ;-) Good luck to all. – shellter Oct 10 '12 at 15:49
  • It's not quite true to how the shell behaves -- `VAR5`, in particular, is something that a POSIX shell wouldn't parse this way, and the `$''` extension isn't handled. – Charles Duffy Oct 10 '12 at 16:16
0

You could use a stream editor to modify the text. You could first grab the variables using a regular expression, and replace them with empty quotes. Append quotes to the beginning and end. At this stage you should have:

VAR1="some text here" 
VAR2='some another text'

in separate strings, and the original string will look like:

"""""some script --with --some=args"

standard command line parsing will return:

""
""
"some script --with --some=args"

Throw out the empty strings, and you should have what you want left over. This is a hacky (potential) solution, and I would urge testing/thinking about it a bit before using something like this.

Stephen Garle
  • 299
  • 1
  • 5