0

I am trying to parse a series of output lines that contain a mix of values and strings. I thought that the set command would be a straightforward way to do it.

An initial test seemed promising. Here's a sample command line and its output:

$ (set "one two" three; echo $1; echo $2; echo $3)
one two
three

Obviously I get two variables echoed and nothing for the third.

However, when I put it inside my script, where I'm using read to capture the output lines, I get a different kind of parsing:

echo \"one two\" three |
while read Line
do
    echo $Line
    set $Line
    echo $1
    echo $2
    echo $3
done

Here's the output:

"one two" three
"one
two"
three

The echo $Line command shows that the quotes are there but the set command does not use them to delimit a parameter. Why not?

In researching the use of read and while read I came across the while IFS= read idiom, so I tried that, but it made no difference at all.

I've read through dozens of questions about quoting, but haven't found anything that clarifies this for me. Obviously I've got my levels of quoting confused, but where? And what might I do to get what I want, which is to get the same kind of parsing in my script as I got from the command line?

Thanks.

August
  • 343
  • 3
  • 10
  • What are you actually trying to accomplish here? Could you [edit] the question to provide more context? See also [_XY Problem._](https://en.wikipedia.org/wiki/XY_problem) – tripleee Oct 30 '20 at 08:44
  • Does my answer to [this superuser question](https://superuser.com/questions/1529226/get-bash-to-respect-quotes-when-word-splitting-subshell-output) solve your problem? – Gordon Davisson Oct 30 '20 at 13:25
  • Thanks @tripleee. Yes, I think I may have aske an XY question. – August Nov 03 '20 at 06:55
  • @GordonDavisson, that answer may help with my X question, but not necessarily with my Y problem. See below, in my comments. – August Nov 03 '20 at 06:56

2 Answers2

2

read does not interpret the quotes, it just reads "one as one token, and two" as another. (Think of all the ways in which things could go wrong if the shell would evaluate input from random places. The lessons from Python 2 and its flawed input() are also an excellent illustration.)

If you really want to evaluate things, eval does that; but it comes with a boatload of caveats, and too often leads to security problems if done carelessly.

Depending on what you want to accomplish, maybe provide the inputs on separate lines? Or if these are user-supplied arguments, just keep them in "$@". Notice also how you can pass a subset of them into a function, which gets its own local "$@" if you want to mess with it.

(Tangentially, you are confusing yourself by not quoting the argument to echo. See When to wrap quotes around a shell variable.)

tripleee
  • 175,061
  • 34
  • 275
  • 318
  • I actually started with separate lines, in pairs of alternating types. My processing script toggles a variable to keep track of which kind is being processed and when it has the second input it kicks off the processing of both together. But now I want to add a third data value to the input to this (which is the output of curl | jq). I had the idea that going this way would avoid the "complexity" of trying to loop through three alternating input data types. Can't I just tell `read`or something else that the input line has three values on it, where one of those values is a quoted string? – August Oct 30 '20 at 16:47
  • My "complexity" in looping through a set of input values was that the code was not following my thinking -- I was using booleans to keep track of the data type, `$isOne`, and `$isTwo`. Adding `$isThree` seemed to confuse the issue and would get worse if I eventually need four because of all the resetting values involved. If I use a single counter instead of booleans, `InputPiece=1`, `=2`, `=3` (and more, if needed) then the code can mirror my thinking more closely. I'll give that a try. – August Oct 30 '20 at 17:00
  • 1
    It would still help if you could add some context to your question. If you can extract the strings with `jq`, extracting them to the shell as variable should not be that hard. – tripleee Oct 30 '20 at 20:51
  • I'm querying a database, in this case my personal To-do DB on trello.com. I'm getting the items in a specific list via `curl` and filtering the `curl` data with `jq`. When I have a single value that I want, it's easy to assign the resulting value to a shell variable with something like `\`CurrentItemId=\`curl... | jq -r ' ... '\`` – August Nov 03 '20 at 06:24
  • But when I have three values to extract from a single `curl` command, I was thrashing and thought that `set` might be useful. I've abandoned `set` and now I piped the `curl` output to `while IFS= read DataLine` and then do a `case` to process whatever the next value is in sequence. So far that's working. But it still feels kludgy. I REALLY like to have my code mirror the problem and the problem doesn't rotate between differnt values, how I'm extracting the data is what's doing that. – August Nov 03 '20 at 06:37
  • It has occurred to me that maybe instead of having `jq` just produce the values, in order, I could have `jq` produce a JSON structure with the values and names, put that structure into a shell variable as a single output from the original `curl`, and then parse that structure with another `jq` command to extract each data type in turn. That might be cleaner and clearer and not feel as kludgy. That's my current direction because I have another place where I'm trying the same kind of thing and I really don't want to repeat the kludge, even though I got it working the first time. – August Nov 03 '20 at 06:39
  • [Please excuse the extra `\`` before `CurrentItemID` in the above command example.] – August Nov 03 '20 at 10:52
  • I don't understand why you continue to comment here. Again, please edit your question to clarify it if you want answers to a different question than the one you originally asked. (But don't change it too much. Probably accept one of the answers here and ask a new question if your scope is no longer very close to what you started with.) – tripleee Nov 03 '20 at 10:57
  • Sorry. I'm not used to the idea of changing the question based on the answers rather than recording a conversation. You asked for more context, I attempted to give it. Are you saying that you'd rather I actually add the context to the question and delete the comments so everything looks all tidy? I can do that if necessary. I do appreciate the tutorial. – August Nov 03 '20 at 11:05
  • Again, depends on how much exactly you need to change. Probably the safest bet is to accept one of the answers and ask a new question altogether, as you already have two answers to what you originally asked. If you can make minor edits which do not invalidate these answers, then editing the question to make it clearer should still be fine. – tripleee Nov 03 '20 at 11:32
2

Why not?

read splits the input on each character that's in IFS. With unset or default IFS, that's space or tab or newline. Any other characters are not special in any way and quotes are not anyhow special.

Obviously I've got my levels of quoting confused, but where?

You wrongly assumed read is smart enough to interpret quotes. It isn't. Moreover, read ignores \ sequences. Read how to read a stream line by line and bash manual word splitting.

what might I do to get what I want, which is to get the same kind of parsing in my script as I got from the command line?

To get the same parsing as you got from the command line you may use eval. eval is evil. Eval command and security issues.

echo \"one two\" three |
while IFS= read -r line; do
      eval "set $line"  # SUPER VERY UNSAFE DO NOT USE
      printf "%s\n" "$@"
done

When using eval a malicious user may echo '"$(rm -rf *)"' | ... remove your files in an instant. The simplest solution in the shell is to use xargs, which (mostly confusingly) includes quotes parsing when parsing input.

echo \"one two\" three | xargs -n1 echo
KamilCuk
  • 120,984
  • 8
  • 59
  • 111
  • Thanks @KamilCuk and @tripleee, I will NOT use `eval`. I had come across `xargs` as a possible tool for this, but I got thoroughly lost in the man page for it. – August Oct 30 '20 at 16:50