The bash
shell by default does not provide a multi character IFS
support to delimit, but since it is a file we are dealing with, we can use GNU Awk
with its support for FPAT to define how to handle each word we are dealing with.
From the GNU Awk man page under Defining Fields by Content
Normally, when using FS
, gawk
defines the fields as the parts of the record that occur in between each field separator. In other words, FS
defines what a field is not, instead of what a field is. However, there are times when you really want to define the fields by what they are, and not by what they are not.
The latter part is when we need to use FPAT
, for your requirement with space separated strings and strings within double-quotes, we define a pattern as below, meaning anything that is not a space (or) containing inside double-quote but not a double-quote.
FPAT = "([^[:space:]]+)|("[^"]+")"
But to write it as string into Awk
, you need to escape the double-quotes above,
awk 'BEGIN{FPAT = "([^[:space:]]+)|(\"[^\"]+\")"}{for(i=1;i<=NF;i++) print $i}' myFile
This will print each word of your input in question in a separate line as below,
foo
bar
"foo bar baz"
baz
From here, to store in a bash
context all you need is process-substitution and mapfile
command,
mapfile -t newArray < <(awk 'BEGIN{FPAT = "([^[:space:]]+)|(\"[^\"]+\")"}{for(i=1;i<=NF;i++) print $i}' myFile)
And then you can print the array as
declare -p newArray
(or) print it explicitly
for item in "${newArray[@]}"; do
printf '%s\n' "$item"
done