1

I'm trying to access data from a file full of JSON objects, where the string fields may or may not have escaped quotes in them.

When I try to process it using while read line ; do echo ; done < input.txt, it pukes because of the unbalanced number of quotes.

I have this input file:

$ cat input.txt 
{"sku":"1234", "desc":"Necklace 18\" long", "img":"https://provider.com/12345.jpg", "imgView":"A"}
{"sku":"1234", "desc":"Necklace 18\" long", "img":"https://provider.com/12346.jpg", "imgView":"B"}

When I use read to capture it on the way in, I lose the backslash.

$ while read line ; do echo "${line}" ; done < input.txt
{"sku":"1234", "desc":"Necklace 18" long", "img":"https://provider.com/12345.jpg", "imgView":"A"}
{"sku":"1234", "desc":"Necklace 18" long", "img":"https://provider.com/12346.jpg", "imgView":"B"}

$ while read line ; do echo "${line}" | sed 's/\\/\\\\/g' ; done < input.txt
{"sku":"1234", "desc":"Necklace 18" long", "img":"https://provider.com/12345.jpg", "imgView":"A"}
{"sku":"1234", "desc":"Necklace 18" long", "img":"https://provider.com/12346.jpg", "imgView":"B"}

I have this workaround that I'll use for now to unblock myself. But it's ugly and verbose.

#Showing that it's preserving the escape
$ input=input.txt ; counter=1; length=$(cat ${input} | wc -l)
$ while [ ${counter} -le ${length} ] ; do data=$(tail -n +${counter} ${input} | head -n 1 ) ; echo ${data} ; counter=$(( counter + 1)) ; done 
{"sku":"1234", "desc":"Necklace 18\" long", "img":"https://provider.com/12345.jpg", "imgView":"A"}
{"sku":"1234", "desc":"Necklace 18\" long", "img":"https://provider.com/12346.jpg", "imgView":"B"}

#Showing that jq can actually process the data now
$ input=input.txt ; counter=1; length=$(cat ${input} | wc -l)
$ while [ ${counter} -le ${length} ] ; do desc=$(tail -n +${counter} ${input} | head -n 1 | jq '.desc' -r) ; echo ${desc} ; counter=$(( counter + 1)) ; done 
Necklace 18" long
Necklace 18" long

I feel like I have to get way too low level into how the shell is handling the input. There has to be an easier way, or a flag or something that I'm missing.

Beweeted
  • 319
  • 2
  • 7
  • 6
    Use `read -r` to treat backslashes literally instead of as an escape character. – Barmar Nov 16 '21 at 06:06
  • 2
    Some versions of `echo` also do weird things with backslashes; use `printf '%s\n' "$line"` instead. – Gordon Davisson Nov 16 '21 at 07:02
  • Thank you Barmar. That should definitely be RTFM material, but `man read` was just the manual page telling me about shell built-ins. – Beweeted Nov 16 '21 at 07:07
  • @Barmar - I confirmed it works for me. If you give it as an answer, I'll mark as answered. Thank you again. – Beweeted Nov 16 '21 at 07:09
  • @GordonDavisson Only `echo -e` does that. – Barmar Nov 16 '21 at 07:10
  • @Barmar *Some* versions of `echo` (in some modes) only translate backslashes if given the `-e` option; others (or the same one in other modes) do... different things. Stéphane Chazelas has a good explanation of why it's such a mess in [this Unix & Linux answer](https://unix.stackexchange.com/questions/65803/why-is-printf-better-than-echo/65819#65819). Also, note that the [POSIX spec for `echo`](https://pubs.opengroup.org/onlinepubs/009695399/utilities/echo.html) says that on XSI-conformant systems, backslashes *will* be processed (and doesn't mention `-e`). – Gordon Davisson Nov 16 '21 at 07:24
  • Your question can't be about [tag:bash] and [tag:zsh] at the same time; which is it? – tripleee Nov 16 '21 at 07:39
  • @GordonDavisson The old guidance about `echo` obviously is moot if you are using a specific shell whose built-in `echo` does not do that. (But I do agree that `printf` is better simply because it avoids this distraction.) – tripleee Nov 16 '21 at 07:39
  • @tripleee You can't even count on a shell's builtin `echo` for consistent behavior. I once had a bunch of my scripts break because an OS update included a version of bash compiled with different options, which changed its behavior. That's when I really got the `printf` religion. – Gordon Davisson Nov 16 '21 at 08:57

1 Answers1

0

The escaping of the quotes inside your example data is valid JSON throughout. Depending on what you intend to do inside your loop, maybe it can be accomplished by jq itself, which automatically iterates over each object for you. You can even use it to unescape a given string using the --raw-output option like so:

 jq --raw-output '.desc' input.txt
Necklace 18" long
Necklace 18" long

Again, all depends on what is your actual goal is inside the loop. The following example even encodes for HTML using the @html builtin:

jq --raw-output '@html "<h1>\(.desc)</h1>\n<img src=\"\(.img)\"/>\n"' input.txt
<h1>Necklace 18&quot; long</h1>
<img src="https://provider.com/12345.jpg"/>

<h1>Necklace 18&quot; long</h1>
<img src="https://provider.com/12346.jpg"/>
ikegami
  • 367,544
  • 15
  • 269
  • 518
pmf
  • 24,478
  • 2
  • 22
  • 31