Using jq to process json with control characters

Question

I have the following json file (output.json) with control characters in it (line break, tabs, etc):

{"data”:{“gherkin”:”Given user successful login
And status is '<currentStatus>'
When user clicks '<nextStatus>’
Then status message should change to '<message>'
    Examples:
        | currentStatus | nextStatus    | message       |
        | READY         | PROCESS   | ready to process |
        | PROCESS       | COMPLETE  | ready to complete |
"}}

I need to get the value from "gherkin" field and write it into another file keeping the same format as in the original json.

When using jq command:

jq .data.gherkin output.json

it throws an error:

parse error: Invalid string: control characters from U+0000 through U+001F must be escaped at line 9, column 1

If I remove all control characters from output.json, I will lose the original format of the value of "gherkin" field. Is there a way to accomplish this using jq?

Thanks!

Your file is anything but JSON. JSON has no typographic quotes and it does not have literal newline characters in strings. Your JSON is invalid. Instead of trying to fix the parsing end, fix the broken component that produced this file. — Tomalak, Jan 31 '22 at 18:05
Control chars need to be escaped to be valid JSON. https://www.tutorialspoint.com/json_simple/json_simple_escape_characters.htm Assuming you can't fix the system generating invalid JSON, you should be able to use `tr` to insert the appropriate control chars before passing to `jq` — superstator, Jan 31 '22 at 18:06
If you have access to the code generating that data, consider feeding it as raw text (not JSON) into jq using the `--raw-input` option. It'll generate valid JSON strings (newlines included if you also provide the `--slurp` option), which you can utilize when generating the surrounding JSON structure. In short: `… | jq -Rs .` — pmf, Jan 31 '22 at 18:40

ikegami · Answer 1 · 2022-01-31T18:43:47.740

As the message suggests, there's no need to remove them; you just need to escape them. For example, byte 0A could be replaced with \u000a. That particular one could also be replaced with \n.

This can be used to fix up your input:

perl -pe's/[\x00-\x1F]/ sprintf "\\u%04X", ord $& /eg'

Specifying file to process to Perl one-liner

So, you could chain the two.

perl -pe's/[\x00-\x1F]/ sprintf "\\u%04X", ord $& /eg' output.json |
   jq .data.gherkin

score 1 · Answer 2 · answered Jan 31 '22 at 18:41

With your input,

sed 's/$/\\n/' | tr -d '\n' | sed -e 's/“/"/g' -e 's/”/"/g' | sed '$ s/\\n$//' | jq .

yields:

{
  "data": {
    "gherkin": "Given user successful login\nAnd status is '<currentStatus>'\nWhen user clicks '<nextStatus>’\nThen status message should change to '<message>'\n    Examples:\n        | currentStatus | nextStatus    | message       |\n        | READY         | PROCESS   | ready to process |\n        | PROCESS       | COMPLETE  | ready to complete |\n"
  }
}

The point being that once you have valid JSON, you can use jq or any other JSON-oriented tool.

Using jq to process json with control characters

2 Answers2