2

I have a directory with a large number of json files. Now I want python to read them all in and creates a single jsonl output file.

Here is a post that did something similar (Python conversion from JSON to JSONL), but in comparison to this post the starting point of my question is reading the jsons in to create python object first, before converting them into jsonl.

halfer
  • 19,824
  • 17
  • 99
  • 186
paul_on_pc
  • 139
  • 1
  • 1
  • 7

1 Answers1

2

Here's how you read json files from a directory in python and then output the loaded json files into a single jsonl file:

import os, json
import pandas as pd

directory = '/Path/To/Your/Json/Directory'  #Specify your json directory path here

json_list=[]    #Initiate a new blank list for storing json data in list format
for dirpath, subdirs, files in os.walk(directory):
    print(dirpath)
    print(filename)
    print(file)
    for file in files:
        if file.endswith(".json"):
            with open(os.path.join(dirpath, file)) as json_file: 
                data = json.load(json_file) 
                json_list.append(data)

#Now, output the list of json data into a single jsonl file
with open('output.jsonl', 'w') as outfile:
    for entry in json_list:
        json.dump(entry, outfile)
        outfile.write('\n')
Karthick Mohanraj
  • 1,565
  • 2
  • 13
  • 28