0

I have thousands of json files that I need to combine into a single csv file. Sample json file: https://www.codepile.net/pile/dZGybzwy

I am trying to use a python script to do this since I will be having the sets of json files saved in separate folders with this structure: /Saved/09_24_2020_15_05_5_Town03/targets, /Saved/09_22_2020_15_58_8_Town03/targets, etc. (the json files are located in the target folder) and would like to use a flag to select which folder I need the json files from.

I referred to the similar questions that were already asked here such as saving json files into one single csv and Convert multiple .txt files into single .csv file (python) but the issue I'm having is the .csv file that is created does not have any of the contents of the json files. I would also like the csv file to have the same name as the parent folder in Saved e.g. 09_24_2020_15_05_5_Town03.csv

This is the code I've written:

import pandas as pd
import json
import csv
import argparse
import os
import os.path

from carlahelp.filehelp import read_json

df = pd.DataFrame()
for file in os.listdir('Saved/'):
    if file.endswith('.json'):
        frame=pd.read_json(...)
        df = df.append(frame, ignore_index=True)

if __name__=="__main__":
    data_dir = 'Saved/'
    all_recordings = os.listdir(data_dir)
    if len(all_recordings)==0:
        print("No recordings found in Saved/ folder")
        exit()

    parser = argparse.ArgumentParser()
    #default to latest recording if none specified
    parser.add_argument('-r', '--recording', type=str, default=all_recordings[-1])
    args = parser.parse_args()

    print("Saving to csv {} from {}")
    df.to_csv(os.path.join(data_dir,args.recording+'.csv'))

I'm using Ubuntu 18. Would greatly appreciate any help and advice :) Thanks!

CC25
  • 15
  • 4
  • Have you checked to ensure that the data you want is present in `df` before you try to save it as a `.csv`? By printing `df.head()`, for instance? – James Tollefson Nov 05 '20 at 18:05
  • Just a note on `os.listdir()`, from [the docs](https://docs.python.org/3.8/library/os.html#os.listdir): *"The list is in arbitrary order."* So if you want to default to latest, you'd need to wrap that in a `sorted()` call. – C14L Nov 05 '20 at 18:28
  • @JamesTollefson I get "NameError: name 'df' is not defined" when printing df.head() or print(df) Could you please help me figure out where my code is wrong? – CC25 Nov 11 '20 at 04:37

0 Answers0