15

I have a JSON with 80+ fields. While extracting the message field in the below mentioned JSON file using jq, I'm getting newline characters and tab spaces. I want to remove the escape sequence characters and I have tried it using sed, but it did not work.

Sample JSON file:

{
"HOSTNAME":"server1.example",
"level":"WARN",
"level_value":30000,
"logger_name":"server1.example.adapter",
"content":{"message":"ERROR LALALLA\nERROR INFO NANANAN\tSOME MORE ERROR INFO\nBABABABABABBA\n BABABABA\t ABABBABAA\n\n BABABABAB\n\n"}
}

Can anyone help me on this?

Benjamin W.
  • 46,058
  • 19
  • 106
  • 116
user3792699
  • 339
  • 3
  • 7
  • 17
  • so you **never** want a new-line or tab char in that file? OR are there multiple entries in one file? (Please update your Q, and I will delete this comment). Good luck. – shellter Oct 29 '16 at 16:17
  • 1
    If you use the `-r` option, `jq` will translate escape sequences into real newlines, tabs etc. Is that what you want? `jq -r .content.message file.json`? – hek2mgl Oct 29 '16 at 16:36
  • 1
    No I want to remove the newline and tab spaces – user3792699 Oct 29 '16 at 17:16
  • 2
    For clarity, please add the expected output matching the sample input to your question (one remaining ambiguity is whether you want the enclosing double quotes stripped as well or not). – mklement0 Oct 29 '16 at 18:49

3 Answers3

25

A pure jq solution:

$ jq -r '.content.message | gsub("[\\n\\t]"; "")' file.json
ERROR LALALLAERROR INFO NANANANSOME MORE ERROR INFOBABABABABABBA BABABABA ABABBABAA BABABABAB

If you want to keep the enlosing " characters, omit -r.

Note: peak's helpful answer contains a generalized regular expression that matches all control characters in the ASCII and Latin-1 Unicode range by way of a Unicode category specifier, \p{Cc}. jq uses the Oniguruma regex engine.


Other solutions, using an additional utility, such as sed and tr.

Using sed to unconditionally remove escape sequences \n and t:

$ jq '.content.message' file.json | sed 's/\\[tn]//g'
"ERROR LALALLAERROR INFO NANANANSOME MORE ERROR INFOBABABABABABBA BABABABA ABABBABAA BABABABAB"

Note that the enclosing " are still there, however. To remove them, add another substitution to the sed command:

$ jq '.content.message' file.json | sed 's/\\[tn]//g; s/"\(.*\)"/\1/'
ERROR LALALLAERROR INFO NANANANSOME MORE ERROR INFOBABABABABABBA BABABABA ABABBABAA BABABABAB

A simpler option that also removes the enclosing " (note: output has no trailing \n):

$ jq -r '.content.message' file.json | tr -d '\n\t'
ERROR LALALLAERROR INFO NANANANSOME MORE ERROR INFOBABABABABABBA BABABABA ABABBABAA BABABABAB

Note how -r is used to make jq interpolate the string (expanding the \n and \t sequences), which are then removed - as literals - by tr.

Community
  • 1
  • 1
mklement0
  • 382,024
  • 64
  • 607
  • 775
8

With your input, the following incantation:

$ jq 'walk(if type == "string" then gsub("\\p{Cc}"; "<>") else . end)' 

produces:

{
  "HOSTNAME": "server1.example",
  "content": {
    "message": "ERROR LALALLA<>ERROR INFO NANANAN<>SOME MORE ERROR INFO<>BABABABABABBA<> BABABABA<> ABABBABAA<><> BABABABAB<><>"
  },
  "level": "WARN",
  "level_value": 30000,
  "logger_name": "server1.example.adapter"
}

Of course, the above invocation is just illustrative:

  • you might not need to use walk/1 at all. (walk/1 walks the input JSON.)
  • you might want to use a different character class, or specify a pipeline of gsub/2 invocations.
  • if you simply want to excise the control characters, specify "" as the second argument of gsub/2.

If you do want to use walk/1 but your jq does not have it, then simply add its definition (easily available on the web, such as here) before its invocation.

mklement0
  • 382,024
  • 64
  • 607
  • 775
peak
  • 105,803
  • 17
  • 152
  • 177
  • ++ for several advanced techniques, but, truthfully, the simple `jq -r '.content.message | gsub("[\\n\\t]"; "")' file.json` solution that _could_ be derived from your answer is obscured by the incidental / generalized information. – mklement0 Nov 01 '16 at 20:34
  • @mklement0 - (1) The question includes the phrase "from JSON file" and mentions a large number of fields. Since it's not clear what is actually needed, I thought a generally useful answer would be more generally useful :-)) (2) The question mentions "escape sequence characters" generally, and TAB, NL and CR specifically, whereas the solution you mention in these comments does not cover all three. – peak Nov 01 '16 at 20:45
  • Fair points - there's often ambiguity in the description itself and inconsistencies between the description and the sample data ("newline characters and tab spaces [sic]" are mentioned alongside "escape sequences"). I personally find your answer very useful and learned from it, but my point was that a "gentler" framing with more context could have helped. – mklement0 Nov 01 '16 at 20:50
2

With jq v1.6 the following is possible

jq -rc ".content.message" file.json
  • Since only a string value is being extracted, `-c` (`--compact-output`) has no effect here, and your solution doesn't do what the question asks for (removal of newlines and tabs). – mklement0 Aug 30 '23 at 20:34