How to fix missing json block separator

Question

I'm trying to convert 7z file content list to json and can't fix missing separator between output converted blocks.

I'm little bit newbie in json conversion, but found that jq could do the job. I read the jq documentation and found examples inside here and there also elsewhere without solution.

Please find the use case:

The command line:

    jq -f pf_7z.jq -R 

The input file demo.lst:

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
2018-06-23 14:02:16 D....            0            0  Installer
2018-06-23 14:02:16 .....         3381         1157  Installer\Readme
2018-06-23 14:02:16 .....         4646         1157  Installer\License.txt
2018-06-23 14:02:16 .....       138892       136152  Installer\Setup.exe


The filter file pf7z.jq:

def parse:

def parse_line:
. | map(match("(\\d+-\\d+-\\d+) (\\d+:\\d+:\\d+) (D|.).* +(\\d+) +(\\d+) +(.*\\\\)([^\\\\]*)\\.(.*)")) | .[] |
({ 
  "date" :(.captures[0].string),
  "time" :(.captures[1].string),
  "attr" :(.captures[2].string),
  "size" :(.captures[3].string),
  "path" :(.captures[5].string),
  "name" :(.captures[6].string),
  "extn" :(.captures[7].string)
});

split("\n") | ( {} + (parse_line));

parse

The expected result should be:

{ "date": "2018-06-23", "time": "14:02:16", "attr": ".", "size": "4646", "path": "Installer\", "name": "License", "extn": "txt" }, { "date": "2018-06-23", "time": "14:02:16", "attr": ".", "size": "138892", "path": "Installer\", "name": "Setup", "extn": "exe" }

And I only got :

{ "date": "2018-06-23", "time": "14:02:16", "attr": ".", "size": "4646", "path": "Installer\", "name": "License", "extn": "txt" } { "date": "2018-06-23", "time": "14:02:16", "attr": ".", "size": "138892", "path": "Installer\", "name": "Setup", "extn": "exe" }

without the comma separator between blocks.

Thanks ;-)

Oups! command line not fully documented: jq -f pf_7z.jq -R < demo.lst — madum, Feb 07 '19 at 11:39
Don’t you want to produce valid JSON? It would make sense to produce a valid JSON array, or a valid CSV row, or YAML, or TOML ... — peak, Feb 07 '19 at 13:09
Yes, that's what I'm expected to use valid JSON result to use later on with JSONedit. And using JSONEdit reported me following error : Failed to parse text. *Line 10, Coumn 1 Unexpected text after closing bracket See line 10, Column 1 for detail — madum, Feb 07 '19 at 14:14

peak · Accepted Answer · 2019-02-07T21:32:51.880

Your def for parse_line produces a stream of JSON entities, whereas you evidently want a JSON array. Using your regex, you could write:

def parse:
  def parse_line:
    match("(\\d+-\\d+-\\d+) (\\d+:\\d+:\\d+) (D|.).* +(\\d+) +(\\d+) +(.*\\\\)([^\\\\]*)\\.(.*)")
    | .captures
    | map(.string)
    | { "date" :.[0],
        "time" :.[1],
        "attr" :.[2],
        "size" :.[3],
        "path" :.[5],
        "name" :.[6],
        "extn" :.[7] } ;

  [inputs | parse_line];

parse

Invocation

jq -nR -f 7z.jq 7z.txt

Alternative regex

The regex fragment (D|.).* does not make much sense. You should consider replacing it by (.)[^ ]* or some such.

A simpler solution

def parse_line:
  capture("(?<date>\\d+-\\d+-\\d+) " 
  + "(?<time>\\d+:\\d+:\\d+) " 
  + "(?<attr>.)[^ ]* +" 
  + "(?<size>\\d+) +\\d+ +"
  + "(?<path>.*\\\\)"
  + "(?<name>[^\\\\]*)\\."
  + "(?<extn>.*)");

[inputs | parse_line]

An alternative approach

From the comment about JSONEdit, it seems likely to me that your overall approach might be suboptimal. Have you considered using jq rather than jq with JSONEdit?

I m not a json expert and I just dscovered jq few days ago, I have been using JSONedit since few months as I had to check multiple data conversions to json. I saw through this example the power of jq and I will enhance my training on it to extend my usage of it. Many thanks Peter for your pedagogic approach ;-) — madum, Feb 08 '19 at 05:31