0

I am currently using this Miller command to convert a CSV file into a JSON array file:

mlr --icsv --ojson --jlistwrap cat sample.csv > sample.json

It works fine, but the JSON array is too large.

Can Miller split the output into many smaller JSON files of X rows each?

For example if the original CSV has 100 rows, can I modify the command to output 10 JSON Array files, with each JSON array holding 10 converted CSV rows?

Bonus points if each JSON Array can also be wrapped like this:

{
  "instances": 

//JSON ARRAY GOES HERE

}
TinyTiger
  • 1,801
  • 7
  • 47
  • 92

1 Answers1

2

you could run this

mlr --c2j --jlistwrap put -q '
  begin {
    @batch_size = 1000;
  }
  index = int(floor((NR-1) / @batch_size));
  label = fmtnum(index,"%04d");
  filename = "part-".label.".json";
  tee > filename, $*
' ./input.csv

You will have a file named part-00xx every 1000 record.

aborruso
  • 4,938
  • 3
  • 23
  • 40