2

I am trying to split a large JSON file (~4 Mio elements) into separate files (one file per element).

The file kinda looks like this:

{
  "books": [
    {
      "title": "Professional JavaScript - \"The best guide\"",
      "authors": [
        "Nicholas C. Zakas"
      ],
      "edition": 3,
      "year": 2011
    },
    {
      "title": "Professional JavaScript",
      "authors": [
        "Nicholas C.Zakas"
      ],
      "edition": 2,
      "year": 2009
    },
    {
      "title": "Professional Ajax",
      "authors": [
        "Nicholas C. Zakas",
        "Jeremy McPeak",
        "Joe Fawcett"
      ],
      "edition": 2,
      "year": 2008
    }
  ]
}

To split each book into a separate file, I am using the following command:

cat books.json | jq -c -M '.books[]' | while read line; do echo $line > temp/$(date +%s%N).json; done

For the last two items, everything's ok, because the book title does not contain any quotes. However, in the first one, the \" get replaced by " which leads to a broken JSON file, as the subsequent parser - of course - interprets the " as a boundary of an element.

I've tried to use jq -r, but that did not help.

I'm using the jq version shipped by CentOS 7:

[root@machine]$ jq --version
jq-1.6

Any suggestions?

Benjamin W.
  • 46,058
  • 19
  • 106
  • 116
  • FYI, best practices for using `while read` loops are covered in [BashFAQ #1](http://mywiki.wooledge.org/BashFAQ/001). And while it doesn't make a big difference with `jq`, it's good to get out of the habit of using `cat filename | ...`; some programs (including `sort` and `tail`) can run much faster when they're given a real, seekable file handle instead of a FIFO (aka pipe) that can only be read once front-to-back. – Charles Duffy Mar 23 '20 at 19:15
  • Does this answer your question? [sh read command eats slashes in input?](https://stackoverflow.com/questions/924388/sh-read-command-eats-slashes-in-input) – Charles Duffy Mar 23 '20 at 19:18
  • [Why do backslashes disappear when run through `echo`?](https://stackoverflow.com/questions/10238617/why-do-backslashes-disappear-when-run-through-echo) is another preexisting candidate duplicate (the title doesn't make it clear, but it's the same `while read; do` mistake). – Charles Duffy Mar 23 '20 at 19:19

1 Answers1

2

You have to use the -r option to read:

while read -r line; do echo "$line" > temp/"$(date +%s%N)".json; done

It prevents interpreting backslash escapes.

And you should quote your variables.

See the difference:

$ read var <<< 'quoted quotes: \"\"'
$ echo "$var"
quoted quotes: ""
$ read -r var <<< 'quoted quotes: \"\"'
$ echo "$var"
quoted quotes: \"\"

Using -r with read is almost always what you want and really should have been the default behaviour.

Benjamin W.
  • 46,058
  • 19
  • 106
  • 116