1

I am using a command in my shell script that returns multiple strings, each enclosed inside "". Since I need each of these strings as separate elements of an array, I am splitting this collection of strings by using " as the delimiter, like this:

IFS='"'
arr=($(command that returns multiple strings enclosed in ""))

Now, since there is a " character at the beginning of each string, my script splits each string into a blank string and the string itself. For example, the strings "foo" "bar" will be split into (empty string), foo, (empty string again), and bar. So my array ends up having 4 elements, instead of 2.

There can be two approaches to overcome this, and any help in implementing either would be helpful:

  1. Somehow getting rid of the whitespace while splitting.
  2. Creating the array with the whitespaces, and then creating another array, and only inserting those elements from the first into the second array which are not whitespaces.

I am tagging the answer as both bash and ksh as a solution is bash would be acceptable too. Thanks!

lebowski
  • 1,031
  • 2
  • 20
  • 37
  • `arr=( $(...) )` is bad practice from the start. There's effectively *never* a place where it's something one should do. – Charles Duffy Dec 18 '17 at 22:39
  • If you have control over the output, you can use `eval arr=($(echo '"foo" "bar"'))`. This can be potentially dangerous though, for example `eval arr=($(echo '"foo" "$(echo dangerous)"'))`. ` – PesaThe Dec 18 '17 at 22:41
  • (If you want to split words on a delimiter, consider `IFS='"' read -r -a arr <<<"$string"` instead -- that way glob expansion is avoided. Not the right tool for parsing shell-quoted content, however). – Charles Duffy Dec 18 '17 at 22:47

2 Answers2

2

Unless the quoted strings contain newlines, you can use xargs to process your quoted strings into a NUL-delimited list of words:

array=( )
while IFS= read -r -d '' piece; do
  array+=( "$piece" )
done < <(command-that-returns-multiple-quoted-strings | xargs printf '%s\0')

If the quoted strings you're splitting do contain newlines, xargs won't work properly; consider the Python standard-library shlex module instead:

shell_quotes_to_NULs() {
  python -c '
import sys, shlex
for piece in shlex.split(sys.stdin.read()):
    sys.stdout.write(piece)
    sys.stdout.write("\0")
'
}

array=( )
while IFS= read -r -d '' piece; do
  array+=( "$piece" )
done < <(command-that-returns-multiple-quoted-strings | shell_quotes_to_NULs)
Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
1

If you want to store strings in double quotes and ignore the rest, here is an awk solution correctly handling <newline>:

arr=(); 
while IFS= read -r -d '' item; do 
   arr+=("$item"); 
done < <(cmd | gawk -v RS='"[^"]*"' 'RT { gsub("\"", "", RT); printf RT"\0"}');

With bash 4.4 or later:

readarray -d '' arr < <(cmd | gawk -v RS='"[^"]*"' 'RT { gsub("\"", "", RT); printf RT"\0"}')
PesaThe
  • 7,259
  • 1
  • 19
  • 43
  • This would replace `"*"` with a list of filenames in the current directory unless you turn off globbing prior to the expansion. – Charles Duffy Dec 18 '17 at 22:52
  • @CharlesDuffy Yep. My previous "solution" overall was just a hasty attempt, it wasn't working properly...I tried to come up with a solution that would also accept `` and I've updated my answer. – PesaThe Dec 18 '17 at 23:57
  • I'd still call the awk approach here relatively fragile, if put to general-purpose use cases. If I test it with `"foo bar" "baz qux" one\ two three" "four`, none of `one`, `two`, `three` or `four` are present in the output at all, whereas `xargs` gets it all right. – Charles Duffy Dec 19 '17 at 00:05
  • @CharlesDuffy hm, it works for me with `awk`: `declare -a arr='([0]="foo bar" [1]="baz qux" [2]=" ")'`. Which is exactly what I would imagine OP wants. It ignores strings that are not in quotes. – PesaThe Dec 19 '17 at 00:14
  • Ahh -- that's not what I'm expecting the OP to want in those cases (the question this is flagged as a duplicate of is explicitly asking for parsing equivalent to how a shell would behave, which is a rather common request), but I suppose that's a question for them to answer. (As I read the question, they don't anticipate anything but whitespace to exist outside quotes, but *do* want to preserve all content other than whitespace present after string-splitting). – Charles Duffy Dec 19 '17 at 00:15
  • @CharlesDuffy True. I may have misunderstood the question. I tried to create a solution that will save strings in quotes and ignore the rest. – PesaThe Dec 19 '17 at 00:18
  • @CharlesDuffy reading the question again, each string should be enclosed in quotes, so this `awk` may actually be usable. – PesaThe Dec 19 '17 at 00:29
  • I agree that that's the only input that they anticipate existing, and thus that this should work for them individually/personally. OTOH, I don't see other folks trying to use this question being likely to have the same constraint hold, unless maybe the title were explicitly edited to ask about extracting *only* quoted data (in which case the question would no longer be a duplicate, and my answer would no longer apply). – Charles Duffy Dec 19 '17 at 00:29