0

In my shell script, there is a string to keep all arguments

arg_str="arg1=ABC      arg2=123 arg3= arg4=\"consecutive      spaces and \\\"escaped quote\\\" shall be preserved within quotes\""

I want to split the arg_str by space but keep anything as is within in double quotes. The expected result is:

arg1=ABC
arg2=123
arg3=
arg4=consecutive      spaces and \"escaped quote\" shall be preserved within quotes

This answer can split by spaces outside double quotes, but it cannot preserve consecutive spaces within double quotes.

EDIT Why I need this? I'm designing a shell script to accept dynamic arguments -- both arg name and value are dynamic, e.g. myscript.sh arg1=ABC arg2=123 arg3= arg4="consecutive spaces and \"escaped quote\" shall be preserved within quotes". I'm not good at bash. Someone mentioned getopts, it seems it not suitable for the situation of dynamic args.

duckegg
  • 1,379
  • 2
  • 13
  • 20
  • 4
    Using a string for this is fundamentally broken. See https://mywiki.wooledge.org/BashFAQ/050 – tripleee Dec 16 '19 at 14:51
  • Well, why would you ever want to keep it in such a string? There is also `getopts`. – stephanmg Dec 16 '19 at 15:00
  • Does this answer your question? [How do I parse command line arguments in Bash?](https://stackoverflow.com/questions/192249/how-do-i-parse-command-line-arguments-in-bash) – stephanmg Dec 16 '19 at 15:01

2 Answers2

2

If you want to give perl (which is available on most of the Linux/Unix distributions) a chance then it is doable:

arg_str="arg1=ABC      arg2=123 arg3= arg4=\"consecutive      spaces and \\\"escaped quote\\\" shall be preserved within quotes\""

perl -pe 's/"[^"\\]*(?:\\.[^"\\]*)*"(*SKIP)(*F)|\h+/$&\n/g' <<< "$arg_str"

arg1=ABC
arg2=123
arg3=
arg4="consecutive      spaces and \"escaped quote\" shall be preserved within quotes"

RegEx Demo

RegEx Details:

  • ": Match opening "
  • [^"\\]*: Match a character that is not " and not a \
  • (?:: Start a non-capturing group
    • \\: Match a \
    • .: Match any character
    • [^"\\]*: Match a character that is not " and not a \
  • )*: End non-capturing group. * means repeat this group 0 or more times
  • ": Match closing "
  • (*FAIL) behaves like a failing negative assertion and is a synonym for (?!)
  • (*SKIP) defines a point beyond which the regex engine is not allowed to backtrack when the subpattern fails later
  • |: OR
  • \h+: Match 1 or more horizontal whitespaces

If you don't want perl then here is a gnu awk based solution:

awk -v FPAT='[^[:blank:]]*"[^\\\\"]*(\\\\.[^\\\\"]*)*"|[^[:blank:]]+' 
'{for (i=1; i<=NF; i++) print $i}' <<< "$arg_str"

arg1=ABC
arg2=123
arg3=
arg4="consecutive      spaces and \"escaped quote\" shall be preserved within quotes"
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • 1
    I'd somehow suggest not to encourage the OP to work on broken by design problem. – stephanmg Dec 16 '19 at 15:16
  • 2
    There is no intention to encourage/discourage any design pattern. All I see in question is a requirement to split an input with quotes and escaped quotes which needs to be split on *horizontal whitespaces* that are outside quotes. – anubhava Dec 16 '19 at 15:18
  • @OP: After reading your edited question where it appears that this string seems to be coming as command line argument to your script. If that's the case why not just use `printf '<%s>\n' "$@"` and check output. – anubhava Dec 16 '19 at 16:13
  • 1
    @anubhava Yes, I use this to parse command line dynamic arguments. I'm new to shell scripting and not sure if it is a good solution. Your perl and printf version work great and solve my problem. Thanks – duckegg Dec 16 '19 at 23:44
0

I dont know why you would want to do this, but here is a bash way to do it :

#!/bin/bash

#myscript.sh arg1=ABC arg2=123 arg3= arg4="consecutive      spaces and \"escaped quote\" shall be preserved within quotes"

for arg in "${@}"
do
    argname="${arg%%=*}"
    argvalue="${arg##*=}"
    parameters[$argname]="$argvalue"
done

printf "\n" 
echo  "${parameters[arg2]}"

Regards!

Matias Barrios
  • 4,674
  • 3
  • 22
  • 49