4

I have json file exported from mongodb which looks like:

{"_id":"99919","city":"THORNE BAY"}
{"_id":"99921","city":"CRAIG"}
{"_id":"99922","city":"HYDABURG"}
{"_id":"99923","city":"HYDER"}

there are about 30000 lines, I want to split each line into it's own .json file. (I'm trying to transfer my data onto couchbase cluster)

I tried doing this:

cat cities.json | jq -c -M '.' | \
while read line; do echo $line > .chunks/cities_$(date +%s%N).json; done

but I found that it seems to drop loads of line and the output of running this command only gave me 50 odd files when I was expecting 30000 odd!!

Is there a logical way to make this not drop any data using anything that would suite?

peak
  • 105,803
  • 17
  • 152
  • 177
cmdv
  • 1,693
  • 3
  • 15
  • 23
  • 2
    BSD `date` doesn't support `%N` as a fraction of a second. You are losing lines because you are only generating a unique output file name once per second, and you are processing far more than one line per second. – chepner May 27 '17 at 16:10
  • Convertcsv.com has a tool to split CSV, text, or JSON Lines/ND files, see: https://www.convertcsv.com/text-split.htm – dataman Mar 10 '22 at 16:47

2 Answers2

6

Assuming you don't care about the exact filenames, if you want to split input into multiple files, just use split.

jq -c . < cities.json | split -l 1 --additional-suffix=.json - .chunks/cities_
Michael Mior
  • 28,107
  • 9
  • 89
  • 113
  • 2
    doesn't quite work I'm getting `split: illegal option -- -` :( – cmdv May 27 '17 at 12:37
  • 2
    worked it out for osx you need to install coreutils: `brew install coreutils`. Then use `gsplit` instead of `split` :) – cmdv May 27 '17 at 13:25
  • 1
    If you *really* don't care about the output file names, `gsplit` is only necessary for the `--additional-suffix` option. – chepner May 27 '17 at 16:08
1

In general to split any text file into separate files per-line using any awk on any UNIX system is simply:

awk '{close(f); f=".chunks/cities_"NR".json"; print > f}' cities.json
Ed Morton
  • 188,023
  • 17
  • 78
  • 185