8

The Question:

In bash scripting, what is the best way to convert a string, containing literal quotes surrounding multiple words, into an array with the same result of parsed arguments?

The Controversy:

Many questions exist all applying evasive tactics to avoid the problem instead of finding a solution, this question raises the following arguments and would like to encourage the reader to focus on arguments and if you are up for it, partake in the challenge to find the optimum solution.

Arguments raised:

  1. Although there are many scenarios where this pattern should be avoided, because there exists alternative solutions better suited, the author is of the opinion that valid use cases still remain. This question will attempt to produce one such use case, but make no claim to the viability thereof only that it is a conceivable scenario which may present itself in a real world situation.
  2. You must find the optimum solution to satisfy the requirement. The use case was chosen specifically for its real world applications. You may not agree with the decisions that were made but are not tasked to give an opinion only to deliver the solution.
  3. Satisfy the requirement without modifying the input or choice of transport. Both specifically chosen with a real world scenario to defend the narrative that those parts are out of your control.
  4. No answers exist to the particular problem and this question aims to address that. If you are inclined to avoid this pattern then simply avoid the question but if you think you are up for the challenge lets see how you would approach the problem.

The Valid use case:

Converting an existing script currently in use to receive parameters via named pipe or similar stream. In order to minimize the impact on the myriad of scripts outside of the developers control a decision was made to not change the interface. Existing scripts must be able to pass the same arguments via the new stream implementation as they did before.

Existing implementation:

$ ./string2array arg1 arg2 arg3
args=(
    [0]="arg1"
    [1]="arg2"
    [2]="arg3"
)

Required change:

$ echo "arg1 arg2 arg3" | ./string2array
args=(
    [0]="arg1"
    [1]="arg2"
    [2]="arg3"
)

The problem:

As pointed out by Bash and Double-Quotes passing to argv literal quotes are not parsed as would be expected.

This workbench script can be used to test various solutions, it handles the transport and formulates a measurable response. It is suggested that you focus on the solution script which gets sourced with the string as argument and you should populate the $args variable as an array.

The string2array workbench script:

#!/usr/bin/env bash
#string2arry

args=()

function inspect() {
  local inspct=$(declare -p args)
  inspct=${inspct//\[/\\n\\t[}; inspct=${inspct//\'/}; inspct="${inspct:0:-1}\n)"
  echo -e ${inspct#*-a }
}

while read -r; do
  # source the solution to turn $REPLY in $args array
  source $1 "${REPLY}"
  inspect
done

Standard solution - FAILS

The solution for turning a string into a space delimited array of words worked for our first example above:

#solution1

args=($@)

Undesired result

Unfortunately the standard solution produces an undesired result for quoted multi word arguments:

$ echo 'arg1 "multi arg 2" arg3' | ./string2array solution1
args=(
    [0]="arg1"
    [1]="\"multi"
    [2]="arg"
    [3]="2\""
    [4]="arg3"
)

The Challenge:

Using the workbench script provide a solution snippet that will produce the following result for the arguments received.

Desired result:

$ echo 'arg1 "multi arg 2" arg3' | ./string2array solution-xyz
args=(
    [0]="arg1"
    [1]="multi arg 2"
    [2]="arg3"
)

The solution should be compatible with standard argument parsing in every way. The following unit test should pass for for the provided solution. If you can think of anything currently missing from the unit test please leave a comment and we can update it.

Unit test for the requirements

Update: Test simplified and includes the Johnathan Leffer test

#!/usr/bin/env bash
#test_string2array
solution=$1
function test() {
  cmd="echo \"${1}\" | ./string2array $solution"
  echo "$ ${cmd}"
  echo ${1} | ./string2array $solution > /tmp/t
  cat /tmp/t
  echo -n "Result : "
  [[ $(cat /tmp/t|wc -l) -eq 7 ]] && echo "PASSED!" || echo "FAILED!"
}

echo 1. Testing single args
test 'arg1 arg2 arg3 arg4 arg5'
echo
echo 2. Testing multi args \" quoted
test 'arg1 "multi arg 2" arg3 "a r g 4" arg5'
echo
echo 3 Testing multi args \' quoted
test "arg1 'multi arg 2' arg3 'a r g 4' arg5"
echo
echo 4 Johnathan Leffer test
test "He said, \"Don't do that!\" but \"they didn't listen.\""
Community
  • 1
  • 1
nickl-
  • 8,417
  • 4
  • 42
  • 56

7 Answers7

4

The declare built-in seems to do what you want; in my test, it's your inspect function that doesn't seem work to properly test all inputs:

# solution3
declare -a "args=($1)"

Then

$ echo "arg1 'arg2a arg2b' arg3" | while read -r; do
>  source solution3 "${REPLY}"
>  for arg in "${args[@]}"; do
>   echo "Arg $((++i)): $arg"
>  done
> done
Arg 1: arg1
Arg 2: arg2a arg2b
Arg 3: arg3
chepner
  • 497,756
  • 71
  • 530
  • 681
  • Can you elaborate please, the inspect function has nothing to do with the solution required it simply simply does an inspection output of the args collection. You can replace inspect with `printf "%s\n" "${args[@]}" if you like. It specifically use declare -p not declare -a. – nickl- Mar 10 '14 at 09:27
  • You are supposed to implement your own solutionN or improve on any existing solution implementation. – nickl- Mar 10 '14 at 09:30
  • I'm sorry, what *exactly* are you trying to accomplish? The `declare` command does what you ask: it takes the given string and populates the array. As far as I can tell, you are simply trying to enumerate re-implementations of this command. Stack Overflow is not a programming challenge site. – chepner Mar 10 '14 at 12:00
  • Ok my bad! I misunderstood your implementation. Added the unit test results and all works as you mentioned except the Johnathan Leffer test. Negative vote would be redacted but for some reason the answer requires an edit for me to do so. – nickl- Mar 11 '14 at 11:17
  • I saw the proposed test. You didn't quote the embedded double quotes, so the apostrophe, rather than being inside a double-quoted string, fell *outside* the string and so introduced a new single-quoted string instead. I made a minor edit to the answer so you can reverse the down vote if you like. – chepner Mar 11 '14 at 11:58
  • Vote reversed, what was that about "...double-quoted string, fell outside the string..." can you edit and fix please, not sure what you are referring to. – nickl- Mar 11 '14 at 14:20
  • You tested with `echo "He said, "Don't do that!" but 'they didn't listen.'"`, which as written, has unbalanced quotes. The double quotes for `Don't do that!` need to be escaped (`\"`). – chepner Mar 11 '14 at 14:40
2

You may do it with declare instead of eval, for example:

Instead of:

string='"aString that may haveSpaces IN IT" bar foo "bamboo" "bam boo"'
echo "Initial string: $string"
eval 'for word in '$string'; do echo $word; done'

Do:

declare -a "array=($string)"
for item in "${array[@]}"; do echo "[$item]"; done

But please note, it is not much safer if input comes from user!

So, if you try it with say string like:

string='"aString that may haveSpaces IN IT" bar foo "bamboo" "bam boo" `hostname`'

You get hostname evaluated (there off course may be something like rm -rf /)!

Very-very simple attempt to guard it just replace chars like backtrick ` and $:

string='"aString that may haveSpaces IN IT" bar foo "bamboo" "bam boo" `hostname`'
declare -a "array=( $(echo $string | tr '`$<>' '????') )"
for item in "${array[@]}"; do echo "[$item]"; done

Now you got output like:

[aString that may haveSpaces IN IT]
[bar]
[foo]
[bamboo]
[bam boo]
[?hostname?]

More details about methods and pros about using different methods you may found in that good answer: Why should eval be avoided in Bash, and what should I use instead?

See also https://superuser.com/questions/1066455/how-to-split-a-string-with-quotes-like-command-arguments-in-bash/1186997#1186997

But there still leaved vector for attack. I very would have in bash method of string quote like in double quotes (") but without interpreting content.

Community
  • 1
  • 1
Hubbitus
  • 5,161
  • 3
  • 41
  • 47
1

First attempt

Populate a variable with the combined words once the open quote was detected and only append to the array once the close quote arrives.

Solution

#solution2
j=''
for a in ${1}; do
  if [ -n "$j" ]; then
    [[ $a =~ ^(.*)[\"\']$ ]] && {
      args+=("$j ${BASH_REMATCH[1]}")
      j=''
    } || j+=" $a"
  elif [[ $a =~ ^[\"\'](.*)$ ]]; then
    j=${BASH_REMATCH[1]}
  else
    args+=($a)
  fi
done

Unit test results:

$ ./test_string2array solution2
1. Testing single args
$ echo "arg1 arg2 arg3 arg4 arg5" | ./string2array solution2
args=(
    [0]="arg1"
    [1]="arg2"
    [2]="arg3"
    [3]="arg4"
    [4]="arg5"
)
Result : PASSED!

2. Testing multi args " quoted
$ echo 'arg1 "multi arg 2" arg3 "a r g 4" arg5' | ./string2array solution2
args=(
    [0]="arg1"
    [1]="multi arg 2"
    [2]="arg3"
    [3]="a r g 4"
    [4]="arg5"
)
Result : PASSED!

3 Testing multi args ' quoted
$ echo "arg1 'multi arg 2' arg3 'a r g 4' arg5" | ./string2array solution2
args=(
    [0]="arg1"
    [1]="multi arg 2"
    [2]="arg3"
    [3]="a r g 4"
    [4]="arg5"
)
Result : PASSED!
nickl-
  • 8,417
  • 4
  • 42
  • 56
1

So I think xargs actually works for all your test cases, eg:

echo 'arg1 "multi arg 2" arg3' | xargs -0 ./string2array
problemPotato
  • 589
  • 3
  • 8
  • 1
    Brilliant idea! How would you fit this into the solution file though so that it can be run against the unit tests. I.o.w. You have a string variable and you use xargs to populate an array correctly. – nickl- Mar 11 '14 at 11:27
  • The array for me is already in `$BASH_ARGV`, I ran `echo ${#BASH_ARGV[@]}; echo ${BASH_ARGV[@]}` to test it. Found that here: http://stackoverflow.com/a/2741116/3388817 – problemPotato Mar 11 '14 at 13:31
0

Second attempt

Append the element in place without the need for an additional variable.

#solution3
for i in $1; do
  [[ $i =~ ^[\"\'] ]] && args+=(' ')
  lst=$(( ${#args[@]}-1 ))
  [[ "${args[*]}" =~ [[:space:]]$ ]] && args[$lst]+="${i/[\"\']/} " ||  args+=($i)
  [[ $i =~ [\"\']$ ]] && args[$lst]=${args[$lst]:1:-1}
done
nickl-
  • 8,417
  • 4
  • 42
  • 56
0

Modify the delimiter

In this solution we turn the spaces into commas, remove the quotes and reset the spaces for the multi word arguments, to allow for the correct argument parsing.

#solution4
s=${*//[[:space:]]/\l}
while [[ $s =~ [\"\']([^\"\']*)[\"\'] ]]; do
  s=${s/$BASH_REMATCH/${BASH_REMATCH[1]//\l/ }}
done
IFS=\l
args=(${s})

NEEDS WORK!!

nickl-
  • 8,417
  • 4
  • 42
  • 56
  • This solution breaks when there are commas in the original arguments. The unmap operation is not the inverse of the map operation. – Jonathan Leffler Mar 10 '14 at 04:20
  • linefeed should not be possible at all so no need to check for it either, Tx for spotting – nickl- Mar 10 '14 at 05:19
  • Who said line feeds aren't allowed in arguments? The only character not allowed is an ASCII null, and I don't think you can use those as the delimiters usefully. The full gory details of this are really extremely nasty. I'm not sure there's a complete solution, but I admire your attempt to try. I've not come up with a solution either — and I've tried on occasion in decades past, but I've not made a serious recent attempt using Bash and arrays. (My last serious attempt would have been Korn shell mainly, and not exploiting arrays.) – Jonathan Leffler Mar 10 '14 at 05:30
  • **linefeed should not be _possible_** I didn't say not allowed. It's not possible because of the `while read;` loop which reads lines and we are looking at parsing the line hence unlikely to include a linefeed, agreed? I am having a hard time with your "Don't do that test though" =) – nickl- Mar 10 '14 at 09:38
  • Before it dawned on me that linefeed might me the solution (will incorporate in tests) I looked at a fixed collection of delimiters or even the ascii hex range using printf '\x45' for example and substituting the base 16 number to compare with what is currently in use in the argument string, ensure map ∝ unmap – nickl- Mar 10 '14 at 09:49
0

Modify in place

Let bash convert the string to array and then loop through to fix it.

args=($@) cnt=${#args[@]} idx=-1 chr=
for (( i=0; i<cnt; i++ )); do
  [[ $idx -lt 0 ]] && {
    [[ ${args[$i]:0:1} =~ [\'\"] ]] && \
       idx=$i chr=${args[$idx]:0:1} args[$idx]="${args[$idx]:1}"
    continue
  }
  args[$idx]+=" ${args[$i]}"
  unset args[$i]
  [[ ${args[$idx]: -1:1} == $chr ]] && args[$idx]=${args[$idx]:0:-1} idx=-1
done
nickl-
  • 8,417
  • 4
  • 42
  • 56