How do I read the first 100 lines of a json metadata file and write them to a smaller json file? [Python]

Question

I have a json metadata file with around 26 GB of data. For obvious reasons I need to extract the first 100 lines to create a new json file to read, so that I have less alteration possible on the code that follows, which should be for testing on the 100 lines and once debug is completed apply the code on the whole file.

I have read over exporting json to csv but I wish to maintain the json structure and file type, is it possible to do so with Python?

My file is a json with some extra data, so I use a work around to read it in the first place. It looks lik this:


{"_id":{"$oid":"5b9fd47507b317551a7bfb8f"},"title":"It's Okay If You Didn't Like 'Boyhood', And Here Are Many Reasons Why","url":"https://m.huffpost.com/us/entry/6694772","article_text"

And I read it like this

with open('metadata.json', 'r') as data:
    data = json.loads("[" + data.read().replace("}\n{", "},\n{") + "]")

Thanks!

Harsha Biyani · Answer 1 · 2019-12-04T17:50:44.020

0

You can try:

import json
with open('file.json') as ip_file:
  o = json.load(ip_file)
  chunkSize = 100
  for i in range(0, len(o), chunkSize):
    with open('new_file' + '.json', 'a') as out_file:
      json.dump(o[i:i+chunkSize], out_file)

edited Dec 04 '19 at 17:50

answered Dec 04 '19 at 12:45

Harsha Biyani

7,049
9
37
61

you can use `range` in python 3. I have modified answer accordingly – Harsha Biyani Dec 04 '19 at 12:49
1

https://stackoverflow.com/questions/94935/what-is-the-difference-between-range-and-xrange-functions-in-python-2-x – Bart Dec 04 '19 at 12:49
Hi. I tried your code but the output is that I have 1300 file_N.json so I don't understand what is happening :( . I wanted one single file containing the first 300 lines of the big file, not separate ones... Thank you! – rtz Dec 04 '19 at 15:38
@AlessandraRizzo: Please check now. I have modified answer – Harsha Biyani Dec 04 '19 at 17:51

How do I read the first 100 lines of a json metadata file and write them to a smaller json file? [Python]

1 Answers1