3

There are few questions with good answers on how to split strings in Bash scripts by a given separator.

My problem is that I got a file with space separated strings, which may be quoted, e.g.

foo bar "foo bar baz" baz

which I'd like into the 4 values foo, bar, foo bar baz and baz.

How do I split these inputs into a Bash array by respecting the quotes?

codeforester
  • 39,467
  • 16
  • 112
  • 140
muffel
  • 7,004
  • 8
  • 57
  • 98
  • 1
    The question is a bit terse. Is is just a line of strings or does the strings span multiple lines? Also, what have you done so far to resolve this? – sjsam Nov 22 '17 at 12:06

2 Answers2

4

The bash shell by default does not provide a multi character IFS support to delimit, but since it is a file we are dealing with, we can use GNU Awk with its support for FPAT to define how to handle each word we are dealing with.

From the GNU Awk man page under Defining Fields by Content

Normally, when using FS, gawk defines the fields as the parts of the record that occur in between each field separator. In other words, FS defines what a field is not, instead of what a field is. However, there are times when you really want to define the fields by what they are, and not by what they are not.

The latter part is when we need to use FPAT, for your requirement with space separated strings and strings within double-quotes, we define a pattern as below, meaning anything that is not a space (or) containing inside double-quote but not a double-quote.

FPAT = "([^[:space:]]+)|("[^"]+")"

But to write it as string into Awk, you need to escape the double-quotes above,

awk 'BEGIN{FPAT = "([^[:space:]]+)|(\"[^\"]+\")"}{for(i=1;i<=NF;i++) print $i}' myFile

This will print each word of your input in question in a separate line as below,

foo
bar
"foo bar baz"
baz

From here, to store in a bash context all you need is process-substitution and mapfile command,

mapfile -t newArray < <(awk 'BEGIN{FPAT = "([^[:space:]]+)|(\"[^\"]+\")"}{for(i=1;i<=NF;i++) print $i}' myFile)

And then you can print the array as

declare -p newArray 

(or) print it explicitly

for item in "${newArray[@]}"; do
    printf '%s\n' "$item"
done
Inian
  • 80,270
  • 14
  • 142
  • 161
0

If it's only one item with double quote, you can use this sed

sed 's/ /\n/g;h;s/[^"]*"\([^"]*\).*/"\1/;s/\n/ /g;x;G;s/\([^"]*\)"\([^"]*\)\("[^"]*\)\n\(".*\)/\1\4\3/' infile

If it's one or more, you can use this awk

awk -F'"' -vOFS='"' '{for (i=1;i<=NF;i++)if((i%2)==1){gsub(" ","\n",$i)}}1' infile
ctac_
  • 2,413
  • 2
  • 7
  • 17