4

data.in:

a b c 'd e'

script.sh:

while read -a arr; do
    echo "${#arr[@]}"
    for i in "${arr[@]}"; do
        echo "$i"
    done
done

Command:

cat data.in | bash script.sh

Output:

5
a
b
c
'd
e'

Question:

How can I get 'd e' as a single element in the array?


Update. This is the best I've done so far:

while read line; do
    arr=()
    while read word; do
        arr+=("$word")
    done < <(echo "$line" | xargs -n 1)
    echo "${#arr[@]}"
    for i in "${arr[@]}"; do
        echo "$i"
    done
done

Output:

4
a
b
c
d e

However, the following data.in:

"a\"b" c

will fail it (and any other script I have found so far, even in the dup question):

xargs: unmatched double quote; by default quotes are special to xargs unless you use the -0 option

But this input is legal because you can type in command line:

echo "a\"b" c

And it runs well. So this is a mismatch in behavior not illegal input.

Community
  • 1
  • 1
Cyker
  • 9,946
  • 8
  • 65
  • 93
  • 2
    The right answer here is to use `xargs printf '%s\0'` to parse your string into a NUL-delimited stream, which bash can read unambiguously. (`xargs`, when not using `-d` or `-0` extensions, uses shell-like parsing rules to split input into words). See, specifically, http://stackoverflow.com/a/31485948/14122 – Charles Duffy Dec 24 '16 at 20:23
  • I think this question does a better job of expressing the problem in a MCVE than the duplicate it links to. +1 for that. Also, Charles, Tim Toady. I would have suggested a different route if this question was still open to answers other than yours. – ghoti Dec 24 '16 at 21:12
  • @ghoti, eh? I haven't provided an answer at all here (only a comment, and you're perfectly able to provide your own as well), and the linked question *is* open. I'd be happy to see another correct answer there. – Charles Duffy Dec 24 '16 at 21:28
  • @ghoti, ...tangentially, though, I think TIMTOWTDI is **horrible** philosophy. The Pythonic approach, that there should be "one -- and preferably only one -- obvious way to do it", means that the body of idiom is smaller, so there's less need to carefully audit for corner cases when those patterns are correctly deployed. (This is a rant borne of experience: Taking over a commercial Perl codebase written by folks with a different body of idiom was one of the more unpleasant events of my early career). – Charles Duffy Dec 24 '16 at 21:30
  • Alas, the backwards-compatibility needs of shell don't allow the Right Thing to be the only obvious thing, so what we end up instead of a language designed to nudge folks towards good practices is the need for education, static-checking, and a robust set of community practices. – Charles Duffy Dec 24 '16 at 21:36
  • @Cyker, re: your update with a proposed example, be sure to use the `-r` argument to `read`, or else literal backslashes will be stripped. That's important, because `one\ word` won't be parsed as a single word if `read` strikes the backslash before `xargs` can get to it. Or you might use the answer from the question linked as duplicate -- or edit your question to clarify it in such a way as to make clear that the linked duplicate doesn't apply. – Charles Duffy Dec 25 '16 at 06:39
  • @CharlesDuffy Thank you for `-r`. The answer in the dup is for a different job and doesn't seem to work with `data.in` here. I'm still testing the code to see whether there are other corner cases which may fail it. – Cyker Dec 25 '16 at 12:23
  • Can you clarify "doesn't work"? The only difference I see is generating an array line-by-line vs generating an array with the contents of the whole file. To be clear, if you have a genuinely different input format (or otherwise nontrivially varying requirements), I'm happy to reopen the question. – Charles Duffy Dec 25 '16 at 16:24
  • BTW, what's desired behavior if your file contains a backslash right before a newline? Should the result being an array with a newline in its literal contents? – Charles Duffy Dec 25 '16 at 16:27
  • @CharlesDuffy For example, if `data.in` here contains `"a\"b"`, then the dup answer reports *xargs: unmatched double quote; by default quotes are special to xargs unless you use the -0 option* even with `-r`. But `echo "a\"b"` works fine. So there is a mismatch in behavior. I posted on that answer as well if you think the comments here are too long. – Cyker Dec 25 '16 at 21:59
  • @Cyker, that's a bug in the dupe answer -- which I've extended it to address -- not a place where the questions differ. The right thing is to fix the duplicate, so we have a single, canonical place with a correct answer, not one place (there) with a wrong answer and another place (here) with a correct one. – Charles Duffy Dec 26 '16 at 18:57
  • ...which goes in part to the reason *why* we keep duplicate questions in our database: To have multiple pointers to the same canonical answer, so we can focus our efforts on making that answer as correct and expansive as possible, rather than having ragtag attempts at answers that only address half the problem space individually (and where someone trying to find an answer is at the mercy of fortune in terms of how correct a specific instance of the question and answer they find happens to be). – Charles Duffy Dec 26 '16 at 19:00
  • ...the other part of that is to have as many pointers as possible to that same correct answer: People can frame the same question in several different ways, so a good dupe asks a question in a different enough way that someone is more likely to find that canonical location *through* the duplicate than they would if the only keywords they could search on are the ones the first person to ask the same question used in writing it up. When ghoti complimented you on asking this question well, I agree -- it *is* asked well, and that's why we close good duplicates *but don't delete them*. – Charles Duffy Dec 26 '16 at 19:01

1 Answers1

0
$ eval "a=($(cat data.in))"
$ for i in "${a[@]}";do echo "|$i|";done
|a|
|b|
|c|
|d e|
$
Waxrat
  • 2,075
  • 15
  • 13