Split .json File into Multiple Files on Mac

Question

I'm running on a mac and have a very large .json file with more than 100k objects.

I'd like to split the file into many files (preferably 50-100).

SOURCE FILE

The original .json file is a multidimensional array and looks a bit like this:

[{
    "id": 1,
    "item_a": "this1",
    "item_b": "that1"
}, {
    "id": 2,
    "item_a": "this2",
    "item_b": "that2"
}, {
    "id": 3,
    "item_a": "this3",
    "item_b": "that3"
}, {
    "id": 4,
    "item_a": "this4",
    "item_b": "that4"
}, {
    "id": 5,
    "item_a": "this5",
    "item_b": "that5"
}]

DESIRED OUTPUT

If this were split into three files I'd like the output to look like this:

File 1:

[{
    "id": 1,
    "item_a": "this1",
    "item_b": "that1"
}, {
    "id": 2,
    "item_a": "this2",
    "item_b": "that2"
}]

File 2:

[{
    "id": 3,
    "item_a": "this3",
    "item_b": "that3"
}, {
    "id": 4,
    "item_a": "this4",
    "item_b": "that4"
}]

File 3:

[{
    "id": 5,
    "item_a": "this5",
    "item_b": "that5"
}]

Any ideas would be greatly appreciated. Thank you!

score 3 · Answer 1 · answered Jul 26 '16 at 08:00

Perl to the rescue:

#!/usr/bin/perl
use warnings;
use strict;

use JSON;

my $file_count = 5;  # You probably want 50 - 100 here.

my $json_text = do {
    local $/;
    open my $IN, '<', '1.json' or die $!;
    <$IN>
};
my $arr = decode_json($json_text);
my $size = @$arr / $file_count;
my $rest = @$arr % $file_count;

my $i = 1;
while (@$arr) {
    open my $OUT, '>', "file$i.json" or die $!;
    my @chunk = splice @$arr, 0, $size;
    ++$size if $i++ >= $file_count - $rest;
    print {$OUT} encode_json(\@chunk);
    close $OUT or die $!;
}

score 3 · Answer 2 · answered Jul 26 '16 at 08:48

@choroba's answer is very effective and flexible. i have a bash solution with jq.

#!/bin/bash
i=0
file=0
for f in `cat data.json | jq -c -M '.[]'`; 
do 

   if [ $i -eq 2 ]; then

         ret=`jq --slurp "." /tmp/0.json /tmp/1.json  > File$file.json`;
         ret=`rm /tmp/0.json /tmp/1.json`; #cleanup

         ((file = file + 1));
     i=0
   fi
   ret=`echo $f > /tmp/$i.json`;
   ((i = i + 1));
done
if [ -f /tmp/0.json ]; then
    ret=`jq --slurp '.' /tmp/0.json > File$file.json`;
    ret=`rm /tmp/0.json`; #cleanup
fi

Ed Morton · Answer 3 · 2016-07-26T20:15:16.280

1

$ cat tst.awk
/{/ && (++numOpens % 2) {
    if (++numOuts > 1) {
        print out, "}]"
        close(out)
    }
    out = "out" numOuts
    $0 = "[{"
}
{
    # print > out
    print out, $0
}

.

$ awk -f tst.awk file
out1 [{
out1     "id": 1,
out1     "item_a": "this1",
out1     "item_b": "that1"
out1 }, {
out1     "id": 2,
out1     "item_a": "this2",
out1     "item_b": "that2"
out1 }]
out2 [{
out2     "id": 3,
out2     "item_a": "this3",
out2     "item_b": "that3"
out2 }, {
out2     "id": 4,
out2     "item_a": "this4",
out2     "item_b": "that4"
out2 }]
out3 [{
out3     "id": 5,
out3     "item_a": "this5",
out3     "item_b": "that5"
out3 }]

Just remove print out, $0 and uncomment # print > out after you've tested and are happy with it.

edited Jul 26 '16 at 20:15

answered Jul 26 '16 at 14:28

Ed Morton

188,023
17
78
185

Thank you, Ed. I think this is very close. It prints correctly in my terminal while testing, but when I remove the `print out, $0` and uncomment the `# print $0 > out`, the end of out1 and out2 is being printed in terminal but is not included in the files. The `}]` is being left off and is just printed in the terminal. Any ideas on how to resolve? Thank you! – Brandon Jul 26 '16 at 19:14
You must've copy/pasted wrong or uncommented wrong. The script I posted **will not** do what you describe. If you edit your question to show the script you are running we can help you debug it. – Ed Morton Jul 26 '16 at 20:14
This will fail if any key or value contains a `{` character. – Jordan Running Jul 27 '16 at 00:48
1

Absolutely. It's not a JSON parser, all it can do is parse the specific input the OP shared with us. – Ed Morton Jul 27 '16 at 12:49

Split .json File into Multiple Files on Mac

3 Answers3